Dissociating visual form from lexical frequency using Japanese

In Japanese, the same word can be written in either morphographic Kanji or syllabographic Hiragana and this provides a unique opportunity to disentangle a word's lexical frequency from the frequency of its visual form - an important distinction for understanding the neural information processing in regions engaged by reading. Behaviorally, participants responded more quickly to high than low frequency words and to visually familiar relative to less familiar words, independent of script. Critically, the imaging results showed that visual familiarity, as opposed to lexical frequency, had a strong effect on activation in ventral occipito-temporal cortex. Activation here was also greater for Kanji than Hiragana words and this was not due to their inherent differences in visual complexity. These findings can be understood within a predictive coding framework in which vOT receives bottom-up information encoding complex visual forms and top-down predictions from regions encoding non-visual attributes of the stimulus.


Introduction
Of all the world's languages, Japanese uniquely relies on multiple written scripts for its everyday use. Books, magazines, and advertisements all mix morphographic 1 Kanji with syllabographic Hiragana such that no adult text consists solely of one script (although some children's books are written only in Hiragana). As a result, Japanese adults are equally familiar with both scripts and individual words can be written multiple ways. For instance, a word such as ''apple" is as common in Hiragana (りんご) as Kanji (林檎). In many cases, however, one form will be more common than the other. For instance, ''mischief" is usually written in Hiragana (わん ぱく) but sometimes occurs as Kanji (腕白). The fact that the same word can be written in different scripts means that Japanese offers a unique opportunity to disentangle the frequency of a written word (i.e. its lexical frequency) from the frequency of its visual form (i.e. its visual familiarity). In alphabetic languages, on the other hand, written lexical frequency and visual familiarity are essentially the same thing -both measure how frequently a word appears in print. Consequently, much of the literature focuses on lexical frequency as a key factor in understanding the nature of the neural information processing in brain regions related to reading (Fiebach, Friederici, Müller, & Von Cramon, 2002;Hauk, Davis, & Pulvermüller, 2008), despite the possibility that visual familiarity and lexical frequency may have differential effects. This is particularly relevant to theories of the left ventral occipito-temporal cortex (vOT) -an area consistently engaged by visual word recognition (Price & Mechelli, 2005). Here, activation is greater for low than high frequency words (Chee, Westphal, Goh, Graham, & Song, 2003;Hauk et al., 2008;Joubert et al., 2004;Kronbichler et al., 2004) and this effect is not limited to alphabetic languages but also seen in Chinese (Kuo et al., 2003;Lee et al., 2004). Some accounts claim that the area stores and processes language-specific information (Glezer, Jiang, & Riesenhuber, 2009;Kronbichler et al., 2004). Indeed, Kronbichler and colleagues (2004) have argued that vOT is the site of the ''orthographic input lexicon" where orthographic word representations that abstract away from details of the visual form are stored (Bruno, Zumberge, Manis, Lu, & Goldman, 2008;Glezer et al., 2009;Kronbichler et al., 2004Kronbichler et al., , 2007van der Mark et al., 2009). Like logogens (Morton, 1969), entries in this orthographic input lexicon are sensitive to experience, with access to less frequent words requiring greater effort and therefore resulting in greater activation. An alternate explanation is that vOT represents visual form more generally and is not specialized for written words (Price & Devlin, 2011). By this account, the same vOT neurons that represent spatial 0093-934X Ó 2012 Elsevier Inc. doi:10.1016/j.bandl.2012.02.003 configurations important to written words also contribute to other visual stimuli such as objects, scenes and faces. Reciprocal connections with higher order association areas link these visual representations with non-visual properties of the stimulus such as its sound (phonology) or meaning (semantics). As a result, frequency effects arise from the interaction of bottom-up and top-down constraints. Specifically, high frequency written words are more familiar visual patterns and thus have more accurate top-down predictions into vOT reducing prediction error and therefore activation. In contrast, low frequency words result in greater prediction error, increasing the processing demands on vOT and thereby increasing the activation (Price & Devlin, 2011).
The aim of the current study was to test this Interactive Account using Japanese to differentiate between the frequency of a word and its visual familiarity. This distinction is not possible in most languages as individual words can only be correctly written one way. For instance, in alphabetic languages like English, it is possible to write a word phonetically (e.g. ''brane") but literate readers immediately recognize these as incorrectly spelled which is very different from seeing two different forms of a correctly spelled word. Another possibility would be to test bilinguals with the same word in different languages (e.g. ''米" in Japanese and ''rice" in English) but again, this is not optimal since the difference in script is confounded by a difference in language. In this example, the semantic properties of ''米" and ''rice" are not identical because unlike the English word, the Japanese means ''uncooked rice" and there is a separate word for ''cooked rice". In other words, the ''same" word in two different languages often are subtly different, confounding differences in their visual form. Therefore, only those languages that allow a word to be written in multiple forms suffice for dissociating visual familiarity from lexical frequency. Although Chinese (Hanzi vs. Pinyin) and Korean (Hanja vs. Hangul) offer this possibility, in these languages only one form is in daily use, making one script much more familiar than the other. Japanese, on the other hand, is unique in its reliance on multiple forms for everyday use and therefore provides a fertile ground for testing theories of vOT function in reading.
According to the Interactive Account, visual familiarity is expected to strongly modulate activation in vOT, consistent with the hypothesis the region plays a more domain-general role in representing visual patterns, of which written words are only one example. Lexical frequency, on the other hand, is predicted to interact with visual familiarity such that only low frequency words are affected by visual familiarity. For highly frequent words, vOT will receive sufficiently accurate top-down predictions to quickly and accurately match the bottom-up visual information regardless of whether the visual form is more or less familiar. In contrast, less frequent words will send less accurate top-down predictions to vOT, resulting in greater prediction error. Consequently, activation in vOT for low frequency words will benefit from greater visual familiarity. Finally, the Interactive Account makes one further prediction. Namely, consistent with previous studies (Ha Duy Thuy et al., 2004;Ino, Nakai, Azuma, Kimura, & Fukuyama, 2009;Nakamura, Dehaene, Jobert, Le Bihan, & Kouider, 2005;Sakurai et al., 2000), it predicts that greater vOT activation for Kanji relative to Hiragana, due to differences in the top-down signals. Specifically, the inconsistent mapping between Kanji characters and their phonology results in greater prediction error than Hiragana words where there is a consistent mapping between characters and syllables. Consequently, the magnitude of the BOLD signal will be greater for Kanji than for Hiragana.
The current experimental design aimed to evaluate these hypotheses. Participants performed a lexical decision task where they decided whether visual stimuli represented real Japanese words. Stimuli were written in either Kanji or Hiragana and fully crossed with visual familiarity. In other words, each word appeared twice in the course of the experiment, once in Kanji and once in Hiragana. Half of the words were more commonly written in Kanji while the other half were more common in Hiragana. This design allowed us to look for main effects of Visual Familiarity (high vs. low) and Script (Kanji vs. Hiragana) as well as their potential interaction. A second analysis recoded all the words into four sets divided according to lexical frequency (i.e. independent of script) and looked for frequency effects and their interaction with visual familiarity. In this fashion, we could independently evaluate the effects of visual familiarity and lexical frequency on vOT as well as look for potential script differences.

Participants
Forty native Japanese speakers participated in this study although the data from six were excluded due to either excessive motion inside the scanner (i.e. motion greater than the dimensions of a voxel; n = 3 participants) or due to poor performance (i.e. accuracy less than 60% in one or more conditions; n = 3 participants). Consequently, only data from 34 participants (13M, 21F, aged 21-62) were included in the final analyses. Since we hypothesized that the amount of exposure to written Japanese may affect the activation, we tested two groups of native Japanese readers: university students in Tokyo with daily exposure to written Japanese (n = 15, 10M, 5F, aged 21-31) and Japanese ex-patriots who had lived outside Japan for a minimum of 3 years and thus had reduced exposure to written Japanese in their daily lives (n = 19, 3M, 16F, 29-62). The imaging analyses, however, revealed no significant interactions between Group (Tokyo vs. London) and any other factor. As a result, the results presented here collapse over Group despite including it as a factor in the analyses to better model structured variance in the data.
All participants were native Japanese speakers born and educated in Japan through at least secondary school. Consequently, all were literate adult readers in Japanese familiar with both Kanji and Hiragana. In addition, all were right-handed except for one who was confirmed to be ambidextrous according to the Edinburgh Handedness Inventory (Oldfield, 1971). None reported a history of reading difficulties or neurological problems. Each of the London participants had lived outside of Japan, China and Korea (where morphographic or logographic scripts are used) for at least for 3 years (range: 3-34 years, mean = 11). Testing in Tokyo was approved by the ethics committees of the Graduate School of Medicine, the University of Tokyo (#2968), and the ethics committee of the Brain Science Institute, Tamagawa University (C21-4). In London, ethical approved was granted by the NHS Berkshire Research Ethics Committee (06/Q1602/20).

Experimental procedures
The participants' task was to view strings of characters and decide whether the string formed an existing Japanese word or not. The task involved 60 words, each of which was presented twice -once in Kanji and once in Hiragana. One half of the words are most commonly written in Kanji and the other half are most commonly written in Hiragana. An equal number of nonwords, divided evenly between Kanji and Hiragana, were included to ensure adequate task performance.
A trial began with a fixation cross presented for 500 ms. A stimulus (written horizontally using the MS Gothic font) was then presented for 500 ms, followed by a jittered inter-stimulus interval of 1-4 s (mean = 2.5 s). Therefore, the average trial length was 3.5 s. Stimuli were presented in a blocks of 15 trials (lasting 54 s) which included both ''yes" and ''no" responses in a pseudorandomized order. These were separated by 15 s blocks of fixation that served as an implicit baseline. Over a run, there were eight blocks of task and eight blocks of rest. Therefore, each run lasted 9 min and 12 s. There were two runs. Responses were made with a button press, using either the index or middle finger of their right hand to indicate ''yes" and ''no." The response fingers were fully counter-balanced across participants. The stimuli were projected onto a screen and viewed via mirrors attached to the head coil. Participants practiced each task inside the scanner before the main runs began. No items that were used in the practice runs occurred during the main experiment.

Stimuli
The word stimuli were obtained from the NTT Japanese Psycholinguistic Database (Amano & Kondo, 2003a, 2003b by identifying 30 words that had a higher visual familiarity score when written in Kanji than any other script (including Hiragana, Katakana and mixed scripts). Another set of 30 words was found that had a higher visual familiarity score when written in Hiragana than any other script. The Kanji words were then transliterated into Hiragana (''展 望"?" てんぼう") and the Hiragana words transliterated into Kanji (''とんち"?" 頓知") producing 120 words (60 written in Kanji, 60 written in Hiragana). It is important to note that while this transcription changes the visual familiarity of the word, the lexical frequency of the word remains constant. All words were 2 or 3 characters in length when written in Kanji, and between 2 and 6 when written in Hiragana. The resulting word set had four different conditions, each with 30 items (high visual familiarity Kanji words, low visual familiarity Kanji words, high visual familiarity Hiragana words, and low visual familiarity Hiragana words) corresponding to a 2 Â 2 factorial design with Script (Kanji, Hiragana) and Visual Familiarity (High, Low) as factors.
The stimuli were carefully matched along several different dimensions summarized in Table 1a. These values, from the NTT database (Amano & Kondo, 2003a, 2003b, were analysed with a 2 Â 2 ANOVA. Across conditions words were matched for mora length (a measure of phonological complexity). In addition, the analysis confirmed the main effect of Visual Familiarity and demonstrated that this did not interact with Script. Conceptual familiarity (derived by summing familiarity ratings for the visual word and for its auditory form) was matched across Script but naturally it was not possible to match across Visual Familiarity. Finally, Hiragana words had significantly more characters but fewer strokes than Kanji words, which is an inevitable difference between the scripts. It is worth noting, however, that effects of visual complexity and word length are expected to manifest in early visual cortices (Hsu, Lee, & Marantz, 2011;Mechelli, Humphreys, Mayall, Olson, & Price, 2000;Tarkiainen, Helenius, Hansen, Cornelissen, & Salmelin, 1999) rather than in higher order visual regions like vOT.
The same stimuli were then re-grouped according to their lexical frequency -a script-independent measure of how often the word occurs in print regardless of its visual form (i.e. Kanji or Hiragana). Lexical frequency values were calculated by summing the frequency of the Kanji and Hiragana word forms, taken from the NTT database (Amano & Kondo, 2003a, 2003b. For example, the Japanese word pronounced /tembo:/ has a lexical frequency value of 6984, since its written Kanji form (''展望") has a frequency of 6979 and its Hiragana form (''てんぼう") has a frequency value of 5. The frequencies for the 60 lexical items were then divided into quartiles so that those within the lower quartile (i.e. low frequency words) could be compared to those within the upper quartiles (i.e. high frequency words) in order to maximize the distinction between them. Because the stimuli were originally chosen according to their visual familiarity scores across scripts, the distribution of the (log of the) lexical frequencies was nearly uniform over the 60 words. Note that it proved impossible to fully balance visual familiarity, script and lexical frequency into a factorial design, forcing us to interrogate the data in two separate analyses. In order to separate visual familiarity into high and low, each lexical item needed to be presented in both scripts. In contrast, lexical frequency was independent of script. As a result, a full factorial design that included Visual Familiarity, Lexical Frequency and Script was impossible to generate. There were no significant differences between high and low frequency items in terms of number of mora, number of characters, or total stroke count (Table 1b).
Finally, Kanji nonwords were created by combining random Kanji characters that together did not form a word. These were matched 1:1 with the real Kanji words for number of strokes and characters. Hiragana nonwords were created by combining random Hiragana characters that together did not form a word. These were matched 1:1 with the real Hiragana words for number of strokes and characters.

MRI acquisition
For the subjects scanned in Tokyo, whole-brain imaging was performed on a Siemens 3T MRI scanner at the Brain Science Research Center at Tamagawa University. The functional data were acquired with a gradient-echo EPI sequence (TR = 3000 ms; TE = 25 ms; FOV = 192 mm; matrix = 64 Â 64) giving a notional resolution of 3 Â 3 Â 3. For participants in London, whole brain imaging was performed on a Siemens 1.5T MRI scanner at the Birkbeck-UCL Neuroimaging (BUCNI) centre. The functional data were were the same for these conditions. Consequently, differences were assessed with a t-test rather than an ANOVA and are marked in italics. acquired with a gradient-echo EPI sequence (TR = 3000 ms; TE = 50 ms; FOV = 192 mm; matrix = 64 Â 64) giving a notional resolution of 3 Â 3 Â 3. In both cases, a run consisted of 186 volumes and as a result the two runs together took 18 min 24 s. In addition, a high-resolution (1 mm 3 ) T1-weighted anatomical scan was acquired for localizing the functional data on the individual's brain anatomy.

Analyses
In the both the behavioral and imaging data, items whose accuracy was at chance (650%) were excluded from all analyses (n = 9) and only correct trials were analysed. Reaction times (RTs) were recorded from the onset of the stimulus and anticipatory responses (i.e. RTs < 300 ms) were trimmed (0.05% of trials). To minimize the effect of outliers, median RTs per condition per subject were used in the statistical analyses (Ulrich & Miller, 1994). The behavioral data were analysed using a mixed 2 Â 2 Â 2 analysis of variance (ANOVA) with Script (Kanji, Hiragana), Visual Familiarity (High, Low) as within-subject factors and Group (Tokyo, London) as a between-subject factor. Accuracy and RTs were the dependent measures. In addition, the behavioral data were then re-grouped into quartiles according to lexical frequency of the stimuli and analysed using a repeated-measures 4 Â 2 Â 2 ANOVA with Lexical Frequency (Upper, Upper Middle, Lower Middle, Lower) and Visual Familiarity (High, Low) as within-subject factors and Group (Tokyo, London) as a between-subject factor.
The imaging data were processed using SPM8 (Wellcome Trust Centre for Neuroimaging, London UK, http://www.fil.ion. ucl.ac.uk/spm/). The first four volumes in the Tokyo (i.e. 3T) data and two volumes in London (i.e. 1.5T) data were discarded in order to allow for T1 equilibrium. All functional volumes were spatially realigned and unwarped to adjust for minor distortions in the B0 field due to head movement (Andersson, Hutton, Ashburner, Turner, & Friston, 2001). They were then normalized to the MNI-152 EPI template, maintaining the original 3 Â 3 Â 3 mm resolution. Finally, images were smoothed with an isotropic 8 mm full-width half-maximum Gaussian kernel. Time-series from each voxel were high-pass filtered (1/128 Hz cutoff) to remove low-frequency noise and signal drift. The preprocessed functional volumes were then analysed in two separate GLMs. One investigated the effects of visual familiarity and script while the other investigated the effect of lexical frequency and visual familiarity. In both cases, a first-level, fixedeffects analysis combined the two runs from each participant and the estimated effect sizes were entered into a second-level, random-effects analysis to estimate the population effect. At the first level, the onsets of stimuli were modelled as delta functions convolved with a canonical haemodynamic response function (Glover, 1999), which provided regressors for the general linear model. The appropriate contrast images, averaged over sessions, were then generated in all subjects for each condition versus fixation.
The first analysis included four word conditions (Kanji high visual familiarity, Kanji low visual familiarity, Hiragana high visual familiarity and Hiragana low visual familiarity), two nonwords conditions (Kanji, Hiragana) and a condition for incorrect and excluded trials (Murphy & Garavan, 2004). Fixation was not modelled and served as an implicit baseline. The four word-relative-to-rest contrasts were computed and entered into a second-level, 2 Â 2 Â 2 ANOVA with Script (Kanji, Hiragana), Visual Familiarity (High, Low) as within-subject factors and Group (Tokyo, London) as a between-subject factor. We first identified areas of common activations for all eight word conditions using a linear contrast to compute their mean activity and inclusively masking it with each condition relative to fixation at p = 0.001. From this analysis we computed statistical contrasts of the two conditions within a factor, inclusively masking them with common activations of these conditions at p = 0.05.
The second analysis investigated the effect of lexical frequency independent of script. Here, words were divided into quartiles based on their frequency and entered into a second-level, 4 Â 2 Â 2 ANOVA with Lexical Frequency (Upper, Upper middle, Lower middle, Lower) and Visual Familiarity (High, Low) as within-subject factors and Group (Tokyo, London) as a between-subject factor.
Since the primary aim of the current study was to investigate effects in vOT, we defined an a priori anatomical mask for this region-of-interest (ROI). The main anatomical areas of interest were the occipitotemporal sulcus and adjacent regions on the crests of the fusiform and inferior temporal gyri: areas consistently activated by visual word recognition tasks (Bitan et al., 2007;Cai, Paulignan, Brysbaert, Ibarrola, & Nazir, 2010;Cohen et al., 2000;Devlin, Jamison, Gonnerman, & Matthews, 2006;Duncan, Pattamadilok, Knierim, & Devlin, 2009;Fiez & Petersen, 1998;Frost et al., 2005;Herbster, Mintun, Nebes, & Becker, 1997;Kronbichler et al., 2007;Price, Wise, & Frackowiak, 1996;Rumsey et al., 1997;Shaywitz et al., 2004). Because the precise coordinates vary along a rostro-caudal axis, standard space coordinates ranging from X = À36 to À54 and Y = À45 to À66 were used to delineate this region. In addition, the depth of the sulcus coupled with the fact the temporal lobe is angled downwards required a range of Z-coordinates as well (Z = À30 to À6). Together these coordinates describe a rectangular prism that conservatively encompassed the region of vOT sensitive to visual word recognition as well as the anatomically adjacent lobule VI of the cerebellum. Because the cerebellum was both anatomically and functionally distinct, it was manually removed from the ROI mask.
For all imaging analyses, activations were considered significant based on voxel-level inference of p < 0.05, corrected for multiple comparisons either within the ROI (Z > 3.30) or across the entire brain (Z > 4.60). In order to visualize the pattern of activation within a region, we plotted the mean effect size per condition within a 5 mm-radius sphere centered on the peak coordinate. No inferential statistics were based on these effect size plots.

Results
The behavioral results are illustrated in Fig. 1, where the left panel displays accuracy scores and the right reaction times. Within the accuracy data, there was a significant main effect of Visual Familiarity (F(1, 33) = 32.8, p < 0.001), confirming that less visually familiar word forms were more difficult. This was qualified by a significant Visual Familiarity Â Script interaction (F(1, 33) = 26.6, p < 0.001), indicating that the visual familiarity advantage was significant for Kanji (t(33) = 6.6, p < 0.001) but not for Hiragana (t(33) = 0.9, p = 0.365, n.s.). In addition, there was a significant main effect of Script (F(1, 33) = 14.5, p = 0.001) indicating responses to Hiragana were more accurate than to Kanji. The analysis of the reaction time data revealed a similar pattern of results. There was a main effect of Visual Familiarity (F(1, 33) = 37.5, p < 0.001) with responses to less visually familiar forms taking longer than those to highly familiar forms (854 vs. 775 ms). This was qualified by a significant interaction (F(1, 33) = 10.1, p = 0.003) indicating that familiarity effect was larger for Kanji (118 ms) than Hiragana (39 ms). The main effect of Script was not significant (F(1, 33) = 2.7, p = 0.106).
The second behavioral analysis focused on lexical frequency and visual familiarity and the data are shown in Fig. 2. There was a main effect of Lexical Frequency for both accuracy (F(3, 99) = 35.9, p < 0.001) and RTs (F(3, 99) = 24.4, p < 0.001). From the figure it is clear that the lower the lexical frequency, the more difficult the word was with both lower accuracy and longer response times. There was also a main effect of Visual Familiarity for both accuracy (F(1, 33) = 42.5, p < 0.001) and RTs (F(1, 33) = 88.9, p < 0.001). In addition, there was a significant interaction for RTs (F(3, 99) = 5.5, p = 0.002), indicating that visual familiarity had a greater effect on low relative to high frequency items although the interaction for accuracy was not significant (F(3, 99) = 1.4, p = 0.257). This pattern of results remained exactly the same when only the Upper and Lower frequency quartiles were included in the ANOVA. In other words, both lexical frequency and visual familiarity strongly affected overall performance of the task. The question then becomes: to what extent do these two factors affect activation in vOT?

Imaging
The first imaging analysis identified brain regions commonly activated by all four word conditions, in order to determine whether vOT (among other regions) was engaged by both Kanji and Hiragana words, independent of their visual familiarity. As expected, vOT was strongly activated bilaterally, centered on the pos-terior occipitotemporal sulcus and extending laterally into inferior temporal gyrus, medially onto the crest of the fusiform gyrus and inferiorally into lobule VI of the cerebellum. Other bilateral activations included pars opercularis, the pre-supplemental motor area (pre-SMA), the intraparietal sulcus, a mid-cingulate region, and parts of the basal ganglia. In addition, there were several activations only seen in the left hemisphere including Broca's complex (i.e. pars triangularis, pars orbitalis), the deep frontal operculum, the supramarginal gyrus, and a small cluster in the anterior fusiform gyrus (see Table 2 for full details). These results are consistent with previous visual word recognition studies conducted in alphabetic (Carreiras, Mechelli, Estevez, & Price, 2007;Devlin et al., 2006;Fiebach, Ricker, Friederici, & Jacobs, 2007;Hauk et al., 2008) and logographic (Booth et al., 2006;Hu et al., 2010;Kuo et al., 2003;Tan et al., 2001) languages, indicating a common system engaged by visual word processing, independent of script.
Next, we asked whether visual familiarity modulated vOT activation. The comparison of low relative to high visual familiarity items revealed significant left vOT activation in the ROI [À45 À58 À11, Z = 3.34, p = 0.042] and this did not interact with Script  ( Fig. 3a and b). Outside of the ROI, the whole brain search revealed activation in left pars triangularis, left pars opercularis, and in the frontal operculum bilaterally (Table 3) -three regions previously associated with low relative to high frequency effects in alphabetic languages (Carreiras, Mechelli, & Price, 2006;Fiebach et al., 2002;Kronbichler et al., 2004Kronbichler et al., , 2007. In each of these regions, there was a (non-significant) interaction with Script such that low visual familiarity items increased activation for Kanji more than for Hiragana relative to high visual familiarity items. No regions showed significant activation for the contrast of high relative to low visual familiarity, even when the statistical threshold was lowered to p < 0.001, uncorrected for multiple comparisons.
Next we turned to the effects of lexical frequency. We contrasted the lowest frequency items to the highest in order to maximize the difference in frequency. Activation associated with lexical frequency was found in several left frontal regions, including pars opercularis, pars triangularis, a region of anterior paracingulate sulcus and the right deep frontal operculum (Table 4), consistent with previous studies (Carreiras et al., 2006;Fiebach et al., 2002;Hauk et al., 2008;Kronbichler et al., 2004). In addition, the ROI analysis identified a peak in lateral inferior temporal gyrus [À54 À55 À14, Z = 4.15, p = 0.003] adjacent to, but not overlapping, the activation seen in vOT for visual familiarity (Fig. 3c). In fact it was approximately 1 cm lateral to the visual familiarity peak and located in the inferior temporal gyrus, rather than the occipitotemporal sulcus.
Within vOT, there was no main effect of frequency (Fig. 3d). There was, however, a small peak for the interaction of frequency and visual familiarity at [À45 À64 À8], although this did not reach statistical reliability (Z = 2.81, p = 0.177). Nonetheless, the pattern of activation across conditions suggests that visual familiarity modulated low frequency words but not high frequency words (see Fig. 3e). Words in the middle frequency quartiles showed intermediate sized visual familiarity effects.
Finally, we turned to the question of whether the different scripts, Kanji and Hiragana, influenced vOT activation. Relative to Hiragana, Kanji produced significantly greater activation within vOT ( Fig. 3a and b). The peak was slightly posterior to the visual familiarity peak, although the clusters of activation were largely overlapping (Fig. 3a). Outside the vOT region-of-interest no significant activation was found in the whole brain search. The opposite comparison of Hiragana relative to Kanji revealed no significant activation.
To investigate whether the increased activation in vOT for Kanji relative to Hiragana effect was driven by the inevitable difference in the visual complexity across scripts, a 2 Â 2 Â 2 ANOVA with Script (Kanji, Hirgana), Visual Complexity (high, low), and Group (Tokyo, London) was run. Since the two scripts differed in both the number of strokes and of characters, the total number of strokes per trial was used as the measure of visual complexity. There was no main effect of Visual Complexity nor interaction between Visual Complexity and the other two factors within vOT, even at a lenient statistical threshold of p < 0.001 uncorrected for multiple comparisons. In other words, there was no evidence that the vOT activation observed for Kanji relative to Hiragana was a by-product of the greater visual complexity of Kanji words. Outside of our region-of-interest, high relative to low visual complexity was associated with activation in the left calcarine sulcus [À15 À94 À5, Z = 3.68] and the right lingual gyrus [18 À91 À20, Z = 3.90] at a threshold of p = 0.001 (uncorrected). Table 2 Common activation across the four conditions relative to fixation. For each region, the standard space (MNI) coordinates of the peak voxel and the Z-score for the main effect of words relative to rest at that voxel are shown. In addition, the final four columns display the Z-score for each condition relative to rest at that same coordinate.

Region
Mean peak coordinate Z relative to rest

Discussion
The aim of the current study was to test whether visual familiarity and lexical frequency have separable effects on activation levels in vOT, as predicted by the Interactive Account. The results confirmed that visual familiarity, as opposed to lexical frequency, had a strong effect on vOT activation that was qualified by a small (but non-significant) interaction. Visual familiarity had essentially no effect on the most frequent words but a greater effect on the least frequent. In contrast, lexical frequency modulated activation in a region of the inferior temporal gyrus lateral to the visual familiarity effect in vOT. Finally, vOT also showed higher activation for Kanji than Hiragana words, although this was not due to their inherent differences in visual complexity. These findings place important constraints on understanding the nature of neural information processing in the region.
Given that vOT is a region of extrastriate visual cortex, it is perhaps not surprising that the region is sensitive to the familiarity of visual patterns. Indeed, a visual familiarity effect for faces in vOT has been previously reported (Eger, Schweinberger, Dolan, & Henson, 2005). Although written words are a special form of familiar visual patterns, they too appear to be sensitive to this basic property of the visual system (Nazir, Ben-Boutayab, Decoppet, Deutsch, & Frost, 2004;Xue, Chen, Jin, & Dong, 2006;Xue, Jiang, Chen, & Dong, 2008). Within a predictive coding account, this is implemented in terms of more accurate top-down predictions for highly familiar visual patterns. This reduces the prediction error between  the top-down and the bottom-up signals, reducing the regional BOLD signal. It is worth noting, however, that this visual familiarity effect is likely to be task-dependent and only present in tasks that place strong demands on integrating bottom-up visual form information with top-down non-visual properties of the stimulus. In the current experiment, linking visual forms to their sound and meaning is important for either recognizing them as a word or correctly rejecting them as a nonword. Tasks with similar demands on vOT processing such as reading aloud or reading for meaning would also be expected to demonstrate greater vOT activation for less visually familiar words. In contrast, one-back tasks or purely perceptual decisions may not show a visual familiarity effect for written words because neither places significant demands on integrating bottom-up visual and top-down non-visual information (Hellyer, Woodhead, Leech, & Wise, 2011;Price & Devlin, 2011;Wang, Yang, Shu, & Zevin, 2011). Like visual familiarity, lexical frequency would also be expected to modulate the accuracy of top-down predictions into vOT. Indeed, in alphabetic languages, where there is essentially a single visual pattern per word, lexical frequency does affect vOT activation (Chee, Westphal, et al., 2003;Hauk et al., 2008;Joubert et al., 2004;Kronbichler et al., 2004). In Japanese, however, where a word can be written in different scripts, there was no significant main effect of lexical frequency on vOT activation and only weak evidence of its interaction with visual familiarity. For highly frequent words, visual familiarity had no effect on vOT activation whereas activation for low frequency words was modulated by visual familiarity. In contrast, lexical frequency was found to significantly modulate activation in a region of the left inferior temporal gyrus lateral to the area in vOT showing a visual familiarity effect. Previous studies have argued that this is a functionally separate region engaged by multi-modal semantic processing rather than by visual word forms (Cohen, Jobert, Le Bihan, & Dehaene, 2004;Moore & Price, 1999) and our results are consistent with this.
Finally, although both Kanji and Hiragana strongly engaged vOT, there was significantly greater activation for Kanji relative to Hiragana. Because Kanji words generally have more strokes and fewer characters, they tend to be more visually complex than words written in Hiragana. Supplemental analyses, however, showed that the effects of visual complexity manifested in early visual cortices rather than in vOT, consistent with most previous studies (Hsu et al., 2011;Tarkiainen et al., 1999but see Szwed et al., 2011 who failed to find any significant effects of visual complexity). Thus, we assume the activation difference in vOT reflects the different links between the surface form of the word and its non-visual properties. Specifically, the relation between a Kanji word and its phonological form is largely arbitrary and depends critically on the combination of characters present in the word. Hiragana characters, on the other hand, are nearly 100% consistent in their pronunciation with each one representing a one-to-one mapping to a mora. Consequently, the contribution to prediction error from phonological regions to vOT is much less for Hiragana than for Kanji, resulting in lower vOT activation, analogous to the effect of the orthographic transparency in reading (Nosarti, Mechelli, Green, & Price, 2010;Paulesu et al., 2000).
According to the Interactive Account (Price & Devlin, 2011), all words generate top-down predictions that arrive at vOT and support visual forms consistent with the word (Devlin et al., 2006;Kherif, Josse, & Price, 2011). For instance, a word such as ''yen" sends top-down predictions to vOT that support its Kanji (円), Hiragana ( ), and symbolic (¥) form. These predictions are highly accurate precisely because the word is so common, resulting in essentially equal activation across scripts. In contrast, a less common word such as "wit" sends less accurate top-down predictions to vOT supporting both its Hiragana (とんち) and Kanji (頓知) forms. In this case, the fact that Hiragana is the more visually familiar form results in less prediction error (and therefore less activation) than the Kanji form. By this account, then, a visual familiarity benefit is principally expected for low, but not highly, frequent words -a pattern demonstrated in Fig. 3e, but only weakly. Further studies will be required to establish whether this prediction of an interaction between visual familiarity and lexical frequency in vOT is reliable.
Can the current results also be understood in terms of orthographic input lexicon accounts that posit specialized representations of whole word orthographic patterns? In its strongest form, entries in the lexicon are truly ''lexical" and abstract away from visual properties of the word such as capitalization, font, size and even script. Consequently there is a single lexical entry for a word regardless of its script. Obviously, this version is incompatible with the current findings because it cannot explain the activation differences in vOT due to visual familiarity nor script. Some authors have argued, however, that lexical entries in an orthographic input lexicon are specific not only to words but also letter or case identities (Kronbichler et al., 2009). In the case of Japanese orthography, this would entail separate entries for the Kanji and Hiragana forms. By this account, the ease of accessing the form would be modulated by one's experience with the pattern (i.e. its visual familiarity), and is consistent with the main effect of visual familiarity observed here. The observed differences in activation for Kanji and Hiragana, however, are problematic for accounts that claim vOT is an orthographic input lexicon (Glezer et al., 2009;Kronbichler et al., 2004). If every word in an orthographic lexicon has separate Kanji and Hiragana entries and the accessibility of each entry depends on its visual familiarity, then Hiragana and Kanji should produce equivalent activation unless they differ in terms of visual familiarity. Indeed, some previous experiments that reported greater vOT activation for Kanji relative to Hiragana may have confounded script with visual familiarity (Ha Duy Thuy et al., 2004;Nakamura et al., 2005;Sakurai et al., 2000), but in the current experiment, visual familiarity was carefully balanced across scripts so that Kanji was no more familiar than Hiragana (see also Ino et al., 2009). Nonetheless, we still observed a significant increase in activation for Kanji relative to Hiragana that may prove a challenge for the orthographic input lexicon account.
The current findings also raise an important methodological point about the use of alphabetic scripts in reading research. For many valid reasons, the majority of reading research has been conducted with alphabetic languages and has produced considerable advances in the cognitive and neural mechanisms underlying reading (Price & Mechelli, 2005;Shaywitz et al., 2004;Ziegler & Goswami, 2005). In addition, it has repeatedly been shown that both alphabetic and non-alphabetic scripts such as logographs engage essentially the same neuroanatomical system during reading (Booth et al., 2006;Chee, Soon, & Lee, 2003;Chee et al., 2000;Chen et al., 2002;Fu et al., 2002;Hu et al., 2010;Kuo et al., 2003;Tan et al., 2001). This finding, however, means that inferences drawn regarding the nature of regional information processing need to be consistent with a range of writing systems (i.e. alphabetic, syllabographic, logographic, etc.) in order to explain the common information processing across scripts. For instance, Chinese readers engage a region of left middle frontal gyrus (MFG) more strongly than English readers (Tan, Laird, Li, & Fox, 2005;Tan et al., 2003) and it is theoretically possible that this is due to language-specific neuronal responses. A more parsimonious explanation, however, is that Hanji characters increase visuospatial working memory demands relative to letters and this explanation has the advantage of being consistent with reports of left MFG activation in non-linguistic studies of visual working memory (Bledowski, Kaiser, & Rahm, 2010;Kravitz, Saleem, Baker, & Mishkin, 2011). Moreover, it offers a principled account for why this differential activation disappears in Chinese and English dyslexics (Hu et al., 2010). In short, a unified, cross-cultural account of the neural information processing underlying reading requires a systematic investigation of a range of different languages and scripts.
The current study took advantage of a unique property of the Japanese writing system in order to better characterize neural information processing in vOT during visual word recognition. Here we demonstrate a dissociation between lexical frequency and visual familiarity not possible in alphabetic languages and use the findings to evaluate competing theories of vOT function.