Skip to main content
Log in

A large-scale database of Chinese characters and words collected from elementary school textbooks

  • Published:
Behavior Research Methods Aims and scope Submit manuscript

Abstract

Lexical databases are essential tools for studies on language processing and acquisition. Most previous Chinese lexical databases have focused on materials for adults, yet little is known about reading materials for children and how lexical properties from these materials affect children’s reading comprehension. In the present study, we provided the first large database of 2999 Chinese characters and 2182 words collected from the official textbooks recently issued by the Ministry of Education (MOE) of the People’s Republic of China for most elementary schools in Mainland China, as well as norms from both school-aged children and adults. The database incorporates key orthographic, phonological, and semantic factors from these lexical units. A word-naming task was used to investigate the effects of these factors in character and word processing in both adults and children. The results suggest that: (1) as the grade level increases, visual complexity of those characters and words increases whereas semantic richness and frequency decreases; (2) the effects of lexical predictors on processing both characters and words vary across children and adults; (3) the effect of age of acquisition shows different patterns on character and word-naming performance. The database is available on Open Science Framework (OSF) (https://osf.io/ynk8c/?view_only=5186bd68549340bd923e9b6531d2c820) for future studies on Chinese language development.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Reference

  • Arciuli, J., & Simpson, I. C. (2012). Statistical learning is related to reading ability in children and adults. Cognitive Science, 36(2), 286–304. https://doi.org/10.1111/j.1551-6709.2011.01200.x

    Article  PubMed  Google Scholar 

  • Baron, J., & Strawson, C. (1976). Use of orthographic and word-specific knowledge in reading words aloud. Journal of Experimental Psychology: Human Perception and Performance, 2(3), 386–393.

    Google Scholar 

  • Beijing Academy of Educational Sciences. (1998). Liunianzhi xiaoxue shiyong keben (Elementary school textbooks for first through sixth grades). Beijing Press.

    Google Scholar 

  • Bonin, P., Fayol, M., & Chalard, M. (2001). Age of acquisition and word frequency in written picture naming. The Quarterly Journal of Experimental Psychology Section A, 54(2), 469–489. https://doi.org/10.1080/713755968

    Article  Google Scholar 

  • Borowsky, R., & Masson, M. E. J. (1996). Semantic ambiguity effects in word identification. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22(1), 63–85.

    Google Scholar 

  • Brady, S. A., & Shankweiler, D. (1991). Phonological processes in literacy: a tribute to Isabelle Y. Liberman. Erlbaum Associates.

    Google Scholar 

  • Brysbaert, M., & Cortese, M. J. (2011). Do the effects of subjective frequency and age of acquisition survive better word frequency norms? Quarterly Journal of Experimental Psychology, 64(3), 545–559. https://doi.org/10.1080/17470218.2010.503374

    Article  Google Scholar 

  • Brysbaert, M., Wijnendaele, I. V., & Deyne, S. D. (2000). Age-of-acquisition effects in semantic processing tasks. Acta Psychologica, 104, 215–226.

    Article  PubMed  Google Scholar 

  • Bylund, E., Abrahamsson, N., Hyltenstam, K., & Norrman, G. (2019). Revisiting the bilingual lexical deficit: The impact of age of acquisition. Cognition, 182, 45–49. https://doi.org/10.1016/j.cognition.2018.08.020

    Article  PubMed  Google Scholar 

  • Byrne, B. (1992). Studies in the acquisition procedure for reading: Rationale, hypotheses, and data. In P. B. Gough, L. C. Ehri, & R. Treiman (Eds.), Reading acquisition (pp. 1–34). Erlbaum.

    Google Scholar 

  • Cai, Q., & Brysbaert, M. (2010). SUBTLEX-CH: Chinese word and character frequencies based on film subtitles. PLoS ONE, 5(6), e10729. https://doi.org/10.1371/journal.pone.0010729

    Article  PubMed  PubMed Central  Google Scholar 

  • Cai, Z. G., Huang, S., Xu, Z., & Zhao, N. (2021). Objective ages of acquisition for 3300+ simplified Chinese characters. Behavior Research Methods, 54(1), 311–323. https://doi.org/10.3758/s13428-021-01626-1

    Article  PubMed  Google Scholar 

  • Carroll, J. B., & White, M. N. (1973). Word frequency and age of acquisition as determiners of picture-naming latency. Quarterly Journal of Experimental Psychology, 25(1), 85–95. https://doi.org/10.1080/14640747308400325

    Article  Google Scholar 

  • Chang, Y.-N., & Lee, C.-Y. (2020). Age of acquisition effects on traditional Chinese character naming and lexical decision. Psychonomic Bulletin & Review, 27(6), 1317–1324. https://doi.org/10.3758/s13423-020-01787-8

    Article  Google Scholar 

  • Chang, Y.-N., Hsu, C.-H., Tsai, J.-L., Chen, C.-L., & Lee, C.-Y. (2016). A psycholinguistic database for traditional Chinese character naming. Behavior Research Methods, 48(1), 112–122. https://doi.org/10.3758/s13428-014-0559-7

    Article  PubMed  Google Scholar 

  • Chen, Y. P., Allport, D. A., & Marshall, J. C. (1996). What are the functional orthographic units in chinese word recognition: The stroke or the stroke pattern? The Quarterly Journal of Experimental Psychology Section A, 49(4), 1024–1043. https://doi.org/10.1080/713755668

    Article  Google Scholar 

  • Chen, B., Wang, L., Wang, L., & Peng, D. (2004). The effect of age of word acquisition and frequency on the identification of Chinese double-character words. Psychological Science, 27(5), 1060–1064.

    Google Scholar 

  • Chen, M.-J., Weekes, B. S., Peng, D., & Lei, Q. (2006). Effects of semantic radical consistency and combinability on Chinese character processing. In P. Li, L. H. Tan, E. Bates, & O. J. L. Tzeng (Eds.), The Handbook of East Asian Psycholinguistics (Vol. 1, pp. 175–186). Cambridge University Press.

    Chapter  Google Scholar 

  • Chen, B., Dent, K., You, W., & Wu, G. (2009). Age of acquisition affects early orthographic processing during Chinese character recognition. Acta Psychologica Sinica, 130(3), 196–203. https://doi.org/10.1016/j.actpsy.2008.12.004

    Article  Google Scholar 

  • Chen, X., Hao, M., Geva, E., Zhu, J., & Shu, H. (2009). The role of compound awareness in Chinese children’s vocabulary acquisition and character reading. Reading and Writing, 22(5), 615–631. https://doi.org/10.1007/s11145-008-9127-9

    Article  Google Scholar 

  • Chinese Academy of Social Sciences. (2012). Xiandai Hanyu Cidian (现代汉语词典). Commercial Press.

    Google Scholar 

  • Corral, S., Ferrero, M., & Goikoetxea, E. (2009). LEXIN: A lexical database from Spanish kindergarten and first-grade readers. Behavior Research Methods, 41(4), 1009–1017. https://doi.org/10.3758/BRM.41.4.1009

    Article  PubMed  Google Scholar 

  • De Francis, J. (1989). Visible speech: The diverse oneness of writing systems. University of Hawaii Press.

    Book  Google Scholar 

  • Elementary Education Teaching and Research Center, Beijing Education and Science Institute. (1996). Elementary school textbooks. Beijing, China: Beijing Publishers.

  • Fan, K. Y., Gao, J. Y., & Ao, X. P. (1984). Pronunciation principles of the Chinese character and alphabetic writing scripts. Chinese Character Reform, 3, 23–27.

    Google Scholar 

  • Fang, S.-P. (1994). English word length effects and the Chinese character-word difference: Truth or myth? Chinese Journal of Psychology, 36(1), 59–79.

    Google Scholar 

  • Feldman, L. B., & Siok, W. W. T. (1999). Semantic radicals contribute to the visual identification of chinese characters. Journal of Memory and Language, 40(4), 559–576. https://doi.org/10.1006/jmla.1998.2629

    Article  Google Scholar 

  • Ferrand, L. (2011). Comparing word processing times in naming, lexical decision, and progressive demasking: Evidence from Chronolex. Frontiers in Psychology, 2. https://doi.org/10.3389/fpsyg.2011.00306

  • Fu, Y. (1989). Dictionary of Chinese character properties (汉字属性字典). Language and Culture Press.

    Google Scholar 

  • Ghyselinck, M., Lewis, M. B., & Brysbaert, M. (2004). Age of acquisition and the cumulative-frequency hypothesis: A review of the literature and a new multi-task investigation. Acta Psychologica, 115(1), 43–67. https://doi.org/10.1016/j.actpsy.2003.11.002

    Article  PubMed  Google Scholar 

  • Gilhooly, K. J., & Logie, R. H. (1980). Age-of-acquisition, imagery, concreteness, familiarity, and ambiguity measures for 1,944 words. Behavior Research Methods & Instrumentation, 12(4), 395–427. https://doi.org/10.3758/BF03201693

    Article  Google Scholar 

  • Goswami, U., & Bryant, P. E. (1990). Phonological skills and learning to read. Erlbaum.

    Google Scholar 

  • Grömping U (2006). Relaimpo: Relative Importance of Regressors in Linear Models. R package version 1.1-1.

  • Hair, J. F. (2011). Multivariate Data Analysis: An Overview. In M. Lovric (Ed.), International Encyclopedia of Statistical Science. Springer. https://doi.org/10.1007/978-3-642-04898-2_395

    Chapter  Google Scholar 

  • Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2009). Multivariate data analysis (7th ed.). Prentice Hall.

    Google Scholar 

  • He, K., & Li, D. (1987). Xiandai Hanyu san qian changyong ci biao [Three thousand most commonly used words in modern Chinese]. Beijing Normal University Press.

    Google Scholar 

  • He, X., Xue, J., & Shu, H. (2011). The effect of regularity and transparency on Chinese characters output by those with dyslexia: From the perspective of connectionists’ reading model. Chinese Journal of Special Education, 6, 37–41.

    Google Scholar 

  • Ho, C.S.-H., Ng, T.-T., & Ng, W.-K. (2003). A “radical” approach to reading development in Chinese: The role of semantic radicals and phonetic radicals. Journal of Literacy Research, 35(3), 849–878. https://doi.org/10.1207/s15548430jlr3503_3

    Article  Google Scholar 

  • Hsu, C.-H., Lee, C.-Y., & Marantz, A. (2011). Effects of visual complexity and sublexical information in the occipitotemporal cortex in the reading of Chinese phonograms: A single-trial analysis with MEG. Brain and Language, 117(1), 1–11. https://doi.org/10.1016/j.bandl.2010.10.002

    Article  PubMed  Google Scholar 

  • Huang, C. R., & Chen, K. J. (1998). Academia Sinica balanced corpus (version 3). Taipei, Taiwan: Academia Sinica.

  • Huang, X., Lin, D., Yang, Y., Xu, Y., Chen, Q., & Tanenhaus, M. K. (2021). Effects of character and word contextual diversity in Chinese beginning readers. Scientific Studies of Reading, 25(3), 251–271. https://doi.org/10.1080/10888438.2020.1768258

    Article  Google Scholar 

  • Ismail, N., & Jemain, A. A. (2007). Handling overdispersion with negative binomial and generalized Poisson regression models. Casualty actuarial society forum (2007th ed., pp. 103–58). Citeseer.

    Google Scholar 

  • Johnston, R. A., & Barry, C. (2006). Age of acquisition and lexical processing. Visual Cognition, 13(7–8), 789–845. https://doi.org/10.1080/13506280544000066

    Article  Google Scholar 

  • Just, M. A., & Carpenter, P. A. (1987). The psychology of reading and language comprehension. Allyn & Bacon.

  • Kaschak, M. P. (2007). Long-term structural priming affects subsequent patterns of language production. Memory & Cognition, 35(5), 925–937. https://doi.org/10.3758/BF03193466

    Article  Google Scholar 

  • Katz, L., Brancazio, L., Irwin, J., Katz, S., Magnuson, J., & Whalen, D. H. (2012). What lexical decision and naming tell us about reading. Reading and Writing, 25(6), 1259–1282. https://doi.org/10.1007/s11145-011-9316-9

    Article  PubMed  Google Scholar 

  • Kellas, G., Ferraro, F. R., & Simpson, G. B. (1988). Lexical ambiguity and the timecourse of attentional allocation in word recognition. Journal of Experimental Psychology: Human Perception and Performance, 14(4), 601–609.

    PubMed  Google Scholar 

  • Lambert, É., & Chesnet, D. (2001). NOVLEX: Une base de données lexicales pour les élèves de primaire. L’année psychologique, 101(2), 277–288. https://doi.org/10.3406/psy.2001.29557

    Article  Google Scholar 

  • Lee, C.-Y., Hsu, C.-H., Chang, Y.-N., Chen, W.-F., & Chao, P.-C. (2015). the feedback consistency effect in chinese character recognition: Evidence from a psycholinguistic norm. Language and Linguistics, 16(4), 535–554. https://doi.org/10.1177/1606822X15583238

    Article  Google Scholar 

  • Lei, L., Pan, J., Liu, H., McBride-Chang, C., Li, H., Zhang, Y., et al. (2011). Developmental trajectories of reading development and impairment from ages 3 to 8 years in Chinese children. Journal of Child Psychology and Psychiatry, 52(2), 212–220. https://doi.org/10.1111/j.1469-7610.2010.02311.x

    Article  PubMed  Google Scholar 

  • Leong, C. K., Cheng, P.-W., & Mulcahy, R. (1987). Automatic processing of morphemic orthography by mature readers. Language and Speech, 30(2), 181–196. https://doi.org/10.1177/002383098703000207

    Article  PubMed  Google Scholar 

  • Lété, B., Sprenger-Charolles, L., & Colé, P. (2004). MANULEX: A grade-level lexical database from French elementary school readers. Behavior Research Methods, Instruments, & Computers, 36(1), 156–166. https://doi.org/10.3758/BF03195560

    Article  Google Scholar 

  • Lewis, M. B., Gerhand, S., & Ellis, H. D. (2001). Re-evaluating age-of-acquisition effects: Are they simply cumulative-frequency effects? Cognition, 78(2), 189–205. https://doi.org/10.1016/S0010-0277(00)00117-7

    Article  PubMed  Google Scholar 

  • Lexicographical Center of Commercial Press. (2002). Xinhua idiom dictionary (新华成语词典). Commercial Press.

    Google Scholar 

  • Li, D. (1993). A study of Chinese characters. Peking University Press.

    Google Scholar 

  • Li, J., Fu, X., & Lin, Z. (2000). Study on the development of Chinese orthographic regularity in school children. Acta Psychologica Sinica, 32(2), 121–126.

    Google Scholar 

  • Li, L. H., Liu, H. G., & Liu, X. L. (2005). Effects of characters construction on basic processing unit of Chinese character recognition. Psychological exploration, 25, 23–27.

    Google Scholar 

  • Li, H., Shu, H., McBride-Chang, C., Liu, H., & Peng, H. (2012). Chinese children’s character recognition: Visuo-orthographic, phonological processing and morphological skills: CHINESE CHILDREN’S CHARACTER RECOGNITION. Journal of Research in Reading, 35(3), 287–307. https://doi.org/10.1111/j.1467-9817.2010.01460.x

    Article  Google Scholar 

  • Li, L., Wang, H. C., Castles, A., Hsieh, M. L., & Marinus, E. (2018). Phonetic radicals, not phonological coding systems, support orthographic learning via self-teaching in Chinese. Cognition, 176, 184–194.

    Article  PubMed  Google Scholar 

  • Li, M.-F., Gao, X.-Y., & Wu, J.-T. (2020). Neighborhood effects in Chinese character recognition: Going beyond phonological perspectives to explain a possible underlying mechanism. Reading and Writing, 33(3), 547–570. https://doi.org/10.1007/s11145-019-09973-4

    Article  Google Scholar 

  • Li, L., Yang, Y., Song, M., Fang, S., Zhang, M., Chen, Q., Cai, Q. (2022). CCLOWW: A grade-level Chinese children’s lexicon of written words. Behavior Research Methods, 1–16. https://doi.org/10.3758/s13428-022-01890-9

  • Li, H. (2018). An analysis of the characteristics and learning applicability of the nationally compiled Chinese textbooks for primary schools (国家统编小学语文教科书的特色与学习适用性分析). Educational Science Research, 8 (In Chinese)

  • Lin, Chien-Jer Charles., & Ahrens, Kathleen. (2010). Ambiguity advantage revisited: Two meanings are better than one when accessing Chinese nouns. Journal of Psycholinguistic Research, 39, 1–19. https://doi.org/10.1007/s10936-009-9120-8

    Article  PubMed  Google Scholar 

  • Linguistics Institute of Chinese Academy of Social Sciences. (2020). Xinhua dictionary (新华字典, version 12). Commercial Press.

    Google Scholar 

  • Liu, Y., Shu, H., & Li, P. (2007). Word naming and psycholinguistic norms: Chinese. Behavior Research Methods, 39(2), 192–198.

    Article  PubMed  Google Scholar 

  • Liu, X., Liu, W., Zhang, L., Xu, X., Zhang, W., Zhang, X., Zhang, J. (2006). A study on the relationship between children’s literacy and character regularity awareness. Chinese Journal of Special Education, 61(7), 56–61.

  • Lu, S. C. (1989). Frequency dictionary of Chinese characters, words and phrases used in Singapore primary school textbooks. Chinese Language and Research Centre, National University of Singapore, Singapore.

  • Lu, S. C. (1992). Frequency dictionary of Chinese characters, words and phrases used in Singapore secondary school textbooks. Chinese Language and Research Centre, National University of Singapore, Singapore.

  • Luo, Z. (1986). The Great Chinese Word Dictionary(汉语大词典, version 1). Shanghai Lexicographical Publishing House.

    Google Scholar 

  • Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R., Bethard, S., McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. Paper presented at the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, Maryland, USA.

  • Marconi, L., Ott, M., Pesenti, E., Ratti, D., & Tavella, M. (1993). Lessico elementare: Dati statistici sull’italiano scritto e letto dai bambini delle elementari [Elementary lexicon: Statistical data for Italian written and spoken by elementary school children]. Zanichelli.

    Google Scholar 

  • Martín, J. A. M., & Pérez, M. E. G. (2008). ONESC: A database of orthographic neighbors for Spanish read by children. Behavior Research Methods, 40(1), 191–197. https://doi.org/10.3758/BRM.40.1.191

    Article  PubMed  Google Scholar 

  • Masterson, J., Stuart, M., Dixon, M., & Lovejoy, S. (2010). Children’s printed word database: Continuities and changes over time in children’s early reading vocabulary. British Journal of Psychology, 101(2), 221–242. https://doi.org/10.1348/000712608X371744

    Article  PubMed  Google Scholar 

  • McBride, C. A. (2016). Is Chinese special? Four aspects of chinese literacy acquisition that might distinguish learning Chinese from learning alphabetic orthographies. Educational Psychology Review, 28(3), 523–549. https://doi.org/10.1007/s10648-015-9318-2

    Article  Google Scholar 

  • McBride-Chang, C., & Ho, C. S. H. (2000). Developmental issues in Chinese children's character acquisition. Journal of Educational Psychology, 92(1), 50–55. https://doi.org/10.1037/0022-0663.92.1.50

    Article  Google Scholar 

  • Meng, X., Shu, H., & Zhou, X. (2000). Children’s Chinese character structure awareness in character output. Psychological Science, 23(3), 260–240.

    Google Scholar 

  • Meng, X., Shu, H., Zhou, X., & Luo, X. (2000). Character production and recognition in Chinese processing: a comparative study between poor readers and normal readers of fourth grade. Acta Psychologica Sinica, 32(02), 133–138.

    Google Scholar 

  • Morrison, C. M., & Ellis, A. W. (2000). Real age of acquisition effects in word naming and lexical decision. British Journal of Psychology, 91(2), 167–180. https://doi.org/10.1348/000712600161763

    Article  PubMed  Google Scholar 

  • Morrison, C. M., Chappell, T. D., & Ellis, A. W. (1997). Age of acquisition norms for a large set of object names and their relation to adult estimates and other variables. The Quarterly Journal of Experimental Psychology Section A, 50(3), 528–559. https://doi.org/10.1080/027249897392017

    Article  Google Scholar 

  • Myers, J. (2019). The grammar of Chinese characters: Productive knowledge of formal patterns in an orthographic system. Routledge.

    Book  Google Scholar 

  • National Language Commission, Ministry of Education, PRC. (2011–2018). Language Situation in China. Beijing: The Commercial Press.

  • O’brien, R. M. (2007). A caution regarding rules of thumb for variance inflation factors. Quality & Quantity, 41(5), 673–690. https://doi.org/10.1007/s11135-006-9018-6

    Article  Google Scholar 

  • Packard, J. L., Chen, X., Li, W., Wu, X., Gaffney, J. S., Li, H., & Anderson, R. C. (2006). Explicit instruction in orthographic structure and word morphology helps Chinese children learn to write characters. Reading and Writing, 19(5), 457–487. https://doi.org/10.1007/s11145-006-9003-4

    Article  Google Scholar 

  • Parkin, A. J. (1982). Phonological recoding in lexical decision: Effects of spelling-to-sound regularity depend on how regularity is defined. Memory & Cognition, 10(1), 43–53. https://doi.org/10.3758/BF03197624

    Article  Google Scholar 

  • Peng, D., & Wang, C. (1997). Basic processing unit of Chinese character recognition: Evidence from stroke number effect and radical number effect. Acta Psychologica Sinica Acta Psychologica Sinica, 29(1), 8–16.

    Google Scholar 

  • Peng, D., Deng, Y., & Chen, B. (2003). The polysemy effect in Chinese one-character word identification. Acta Psychologica Sinica, 35(5), 569–575.

    Google Scholar 

  • Pérez, M. A. (2007). Age of acquisition persists as the main factor in picture naming when cumulative word frequency and frequency trajectory are controlled. Quarterly Journal of Experimental Psychology, 60(1), 32–42. https://doi.org/10.1080/17470210600577423

    Article  Google Scholar 

  • Perfetti, C. A., Tan, L. H. (1998). The time course of graphic, phonological, and semantic activation in Chinese character identification. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24(1), 101–118.

  • Perruchet, P., & Pacton, S. (2006). Implicit learning and statistical learning: One phenomenon, two approaches. Trends in Cognitive Sciences, 10(5), 233–238. https://doi.org/10.1016/j.tics.2006.03.006

    Article  PubMed  Google Scholar 

  • Plonsky, L., & Ghanbar, H. (2018). Multiple regression in L2 research: A methodological synthesis and guide to interpreting R2 values. The Modern Language Journal, 102(4), 713–731.

    Article  Google Scholar 

  • Que, D. L. (2008). Longman Chinese dictionary. Hong Kong: Longman.

  • Raven, J. C., Court, J. H., & Raven, J. (1983). Manual for Ravens Progressive Matrices and Vocabulary Scales. Section 3: Standard Progressive Matrices.

    Google Scholar 

  • Rodd, J., Gaskell, G., & Marslen-Wilson, W. (2002). Making Sense of Semantic Ambiguity: Semantic Competition in Lexical Access. Journal of Memory and Language, 46(2), 245–266. https://doi.org/10.1006/jmla.2001.2810

    Article  Google Scholar 

  • Romberg, A. R., & Saffran, J. R. (2010). Statistical learning and language acquisition. WIREs Cognitive Science, 1(6), 906–914. https://doi.org/10.1002/wcs.78

    Article  PubMed  Google Scholar 

  • Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274(5294), 1926–1928. https://doi.org/10.1126/science.274.5294.1926

    Article  PubMed  Google Scholar 

  • Schroeder, S., Würzner, K.-M., Heister, J., Geyken, A., & Kliegl, R. (2015). childLex: A lexical database of German read by children. Behavior Research Methods, 47(4), 1085–1094. https://doi.org/10.3758/s13428-014-0528-1

    Article  PubMed  Google Scholar 

  • Seidenberg, M. S., Waters, G. S., Barnes, M. A., & Tanenhaus, M. K. (1984). When does irregular spelling or pronunciation influence word recognition? Journal of Verbal Learning and Verbal Behavior, 23(3), 383–404. https://doi.org/10.1016/S0022-5371(84)90270-6

    Article  Google Scholar 

  • Shu, H., & Meng, X. (1996). Awareness of phonological cues in pronunciation of Chinese characters and its development. Acta Psychologica Sinica, 28(2), 160–165.

    Google Scholar 

  • Shu, H., Anderson, R. C., & Wu, N. (2000). Phonetic awareness: Knowledge of orthography-phonology relationships in the character acquisition of Chinese children. Journal of Educational Psychology, 92(1), 56–62. https://doi.org/10.1037/0022-0663.92.1.56

    Article  Google Scholar 

  • Shu, H., Chen, X., Anderson, R. C., Wu, N., & Xuan, Y. (2003). Properties of school Chinese: Implications for learning to read. Child Development, 74(1), 27–47.

    Article  PubMed  Google Scholar 

  • Soares, A. P., Medeiros, J. C., Simões, A., Machado, J., Costa, A., Iriarte, Á., de Almeida, J. J., Pinheiro, A. P., & Comesaña, M. (2014). ESCOLEX: A grade-level lexical database from European Portuguese elementary to middle school textbooks. Behavior Research Methods, 46(1), 240–253. https://doi.org/10.3758/s13428-013-0350-1

    Article  PubMed  Google Scholar 

  • Song, S., Su, M., Kang, C., Liu, H., Zhang, Y., McBride-Chang, C., Tardif, T., Li, H., Liang, W., Zhang, Z., & Shu, H. (2015). Tracing children’s vocabulary development from preschool through the school-age years: An 8-year longitudinal study. Developmental Science, 18(1), 119–131. https://doi.org/10.1111/desc.12190

    Article  PubMed  Google Scholar 

  • Spencer, M., Kaschak, M. P., Jones, J. L., & Lonigan, C. J. (2015). Statistical learning is related to early literacy-related skills. Reading and Writing, 28(4), 467–490. https://doi.org/10.1007/s11145-014-9533-0

    Article  PubMed  Google Scholar 

  • Stadthagen-Gonzalez, H., & Davis, C. J. (2006). The Bristol norms for age of acquisition, imageability, and familiarity. Behavior Research Methods, 38(4), 598–605. https://doi.org/10.3758/BF03193891

    Article  PubMed  Google Scholar 

  • Stuart, M., Dixon, M., Masterson, J., & Gray, B. (2003). Children’s early reading vocabulary: Description and word frequency lists. British Journal of Educational Psychology, 73(4), 585–598. https://doi.org/10.1348/000709903322591253

    Article  PubMed  Google Scholar 

  • Su, Y.-F., & Samuels, S. J. (2010). Developmental changes in character-complexity and word-length effects when reading Chinese script. Reading and Writing, 23(9), 1085–1108. https://doi.org/10.1007/s11145-009-9197-3

    Article  Google Scholar 

  • Sun, H. L., Huang, J. P., Sun, D. J., Li, D. J., & Xing, H. B. (1997). Introduction to language corpus system of modern Chinese study. In Paper collection for the fifth world Chinese teaching symposium (pp. 459–466). Beijing: Peking University Press.

  • Sun, C. C., Hendrix, P., Ma, J., & Baayen, R. H. (2018). Chinese lexical database (CLD): A large-scale lexical database for simplified Mandarin Chinese. Behavior Research Methods, 50(6), 2606–2629. https://doi.org/10.3758/s13428-018-1038-3

    Article  PubMed  Google Scholar 

  • Sze, W. P., Rickard Liow, S. J., & Yap, M. J. (2014). The Chinese Lexicon Project: A repository of lexical decision behavioral responses for 2,500 Chinese characters. Behavior Research Methods, 46(1), 263–273. https://doi.org/10.3758/s13428-013-0355-9

    Article  PubMed  Google Scholar 

  • Tan, L. H., & Peng, D.-L. (1990). The effects of semantic context on the feature analyses of single Chinese characters. Acta Psychologica Sinica, 4, 5–10.

    Google Scholar 

  • Tan, L. H., & Perfetti, C. A. (1997). Visual Chinese Character Recognition: Does Phonological Information Mediate Access to Meaning? Journal of Memory and Language, 37(1), 41–57. https://doi.org/10.1006/jmla.1997.2508

    Article  Google Scholar 

  • Tan, L. H., Hoosain, R., & Peng, D. (1995). Role of early presemantic phonological code in Chinese character identification. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21(1), 43–54.

    Google Scholar 

  • Tong, X., & McBride, C. (2014). Chinese children’s statistical learning of orthographic regularities: Positional constraints and character structure. Scientific Studies of Reading, 18(4), 291–308. https://doi.org/10.1080/10888438.2014.884098

    Article  Google Scholar 

  • Tong, X., McBride-Chang, C., Shu, H., & Wong, A.M.-Y. (2009). morphological awareness, orthographic knowledge, and spelling errors: Keys to understanding early Chinese literacy acquisition. Scientific Studies of Reading, 13(5), 426–452. https://doi.org/10.1080/10888430903162910

    Article  Google Scholar 

  • Tsang, Y.-K., Huang, J., Lui, M., Xue, M., Chan, Y.-W.F., Wang, S., & Chen, H.-C. (2018). MELD-SCH: A megastudy of lexical decision in simplified Chinese. Behavior Research Methods, 50(5), 1763–1777. https://doi.org/10.3758/s13428-017-0944-0

    Article  PubMed  Google Scholar 

  • Tse, C.-S., Yap, M. J., Chan, Y.-L., Sze, W. P., Shaoul, C., & Lin, D. (2017). The Chinese Lexicon Project: A megastudy of lexical decision performance for 25,000+ traditional Chinese two-character compound words. Behavior Research Methods, 49(4), 1503–1519. https://doi.org/10.3758/s13428-016-0810-5

    Article  PubMed  Google Scholar 

  • Tzeng, O. J. L., Zhong, H. L., Hung, D. L., & Lee, W. L. (1995). Learning to be a conspirator: A tale of becoming a good Chinese reader. In B. de Gelder & J. Morais (Eds.), Speech and reading: A comparative approach (pp. 227–246). Erlbaum.

    Google Scholar 

  • Van Esch, D. (2012). Leiden weibo corpus. Downloaded from http://lwc.daanvanesch.nl

  • van Loon-Vervoorn, W. A., & Willemsen, I. (1989). Selective disturbance in lexical knowledge in the elderly with or without dementia. Tijdschrift voor gerontologie en geriatrie, 20(2), 59–65.

    PubMed  Google Scholar 

  • Wang, Q., & Dong, Y. (2013). The N2- and N400-like effects of radicals on complex Chinese characters. Neuroscience Letters, 548, 301–305. https://doi.org/10.1016/j.neulet.2013.05.074

    Article  PubMed  Google Scholar 

  • Wang, R., Huang, S., Zhou, Y., & Cai, Z. G. (2020). Chinese character handwriting: A large-scale behavioral study and a database. Behavior Research Methods, 52(1), 82–96. https://doi.org/10.3758/s13428-019-01206-4

    Article  PubMed  Google Scholar 

  • Wang, H., & Chen, Q. (2019). The process, achievements and experience of the construction of Chinese language textbooks for primary school over the past 70 years (小学语文教材建设70年: 历程、成就、经验). Curriculum, Teaching Material and Method, 11. (In Chinese)

  • Wen, R. (2016). The concept, characteristics and use suggestions of the Chinese teaching material compiled by Ministry of Education ("部编本"语文教材的编写理念、特色与使用建议). Curriculum, Teaching Material and Method, 11. (In Chinese)

  • Wen, R. (2017). How to use the primary Chinese teaching material compiled by Ministry of Education (如何用好“部编本”小学语文教材). Primary Chinese, 25–31. (In Chinese)

  • Xing, H., Shu, H., & Li, P. (2004). The acquisition of Chinese characters: Corpus analyses and connectionist simulations. Journal of Cognitive Science, 5, 1–49.

    Google Scholar 

  • Xu, X., Li, J., & Guo, S. (2020). Age of acquisition ratings for 19,716 simplified Chinese words. Behavior Research Methods, 53(2), 558–573. https://doi.org/10.3758/s13428-020-01455-8

    Article  Google Scholar 

  • Yang, J., McCandliss, B. D., Shu, H., & Zevin, J. D. (2009). Simulating language-specific and language-general effects in a statistical learning model of Chinese reading. Journal of Memory and Language, 61(2), 238–257. https://doi.org/10.1016/j.jml.2009.05.001

    Article  PubMed  PubMed Central  Google Scholar 

  • Yang, X., Peng, P., & Meng, X. (2019). How do metalinguistic awareness, working memory, reasoning, and inhibition contribute to Chinese character reading of kindergarten children? Infant and Child Development, 28(3), e2122. https://doi.org/10.1002/icd.2122

    Article  Google Scholar 

  • Yin, B., & Rohsenow, J. S. (1994). Modern Chinese characters. Sinolingua.

    Google Scholar 

  • Yu, B., Cao, H. (1992). An investigation of the effect of stroke number on the identification of Chinese characters and a discussion of the effect of stroke frequency. Acta Psychologica Sinica, 24(2), 120–126.

  • Zhang, Y., Zhang, L., Shu, H., Xi, J., Wu, H., Zhang, Y., & Li, P. (2012). Universality of categorical perception deficit in developmental dyslexia: an investigation of Mandarin Chinese tones. Journal of Child Psychology and Psychiatry, 53(8), 874–882. https://doi.org/10.1111/j.1469-7610.2012.02528.x

    Article  PubMed  Google Scholar 

  • Zhang, Z., Yuan, Q., Liu, Z., Zhang, M., Wu, J., Lu, C., Ding, G., & Guo, T. (2021). The cortical organization of writing sequence: Evidence from observing Chinese characters in motion. Brain Structure and Function, 226(5), 1627–1639. https://doi.org/10.1007/s00429-021-02276-x

    Article  PubMed  Google Scholar 

  • Zhou, X., & Marslen-Wilson, W. (1999). The nature of sublexical processing in reading chinese characters. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25(4), 819–837.

    Google Scholar 

  • Zhu, X. (1988). Analysis of cueing function of phonetic components in modern Chinese. In X. Yuan (Ed.), Proceedings of the symposium on the Chinese language and characters (pp. 85–99). Guang Ming Daily Press.

    Google Scholar 

  • Ziegler, J. C., Perry, C., Ma-Wyatt, A., Ladner, D., & Schulte-Körne, G. (2003). Developmental dyslexia in different languages: Language-specific or universal? Journal of Experimental Child Psychology, 86(3), 169–193. https://doi.org/10.1016/S0022-0965(03)00139-5

    Article  PubMed  Google Scholar 

  • Ziegler, J. C., Bertrand, D., Tóth, D., Csépe, V., Reis, A., Faísca, L., ... & Blomert, L. (2010). Orthographic depth and its impact on universal predictors of reading: A cross-language investigation. Psychological science, 21(4), 551–559.

Download references

Acknowledgments

The first two authors contributed equally to the manuscript. We thank Leshan Chen for inspiring the research idea. The database is available on Open Science Framework (OSF) (https://osf.io/ynk8c/?view_only=5186bd68549340bd923e9b6531d2c820).

Funding

The study was supported by the National Natural Science Foundation of China (31871097), the National Key Basic Research Program of China (2014CB846102), the Interdisciplinary Research Funds of Beijing Normal University, and the Fundamental Research Funds for the Central Universities (2017XTCX04).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Taomei Guo.

Ethics declarations

Conflict of interest

None declared.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, M., Liu, Z., Botezatu, M.R. et al. A large-scale database of Chinese characters and words collected from elementary school textbooks. Behav Res (2023). https://doi.org/10.3758/s13428-023-02214-1

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.3758/s13428-023-02214-1

Keywords

Navigation