Skip to main content
Log in

Assessing introspective linguistic judgments quantitatively: the case of The Syntax of Chinese

  • Published:
Journal of East Asian Linguistics Aims and scope Submit manuscript

Abstract

The informal judgments of the well-formedness of phrases and sentences have long been used as the primary data source for syntacticians. In recent years, the reliability of data based on linguists’ introspective intuitions is increasingly subject to scrutiny. Although a number of studies were able to replicate a vast majority of English judgments published in a textbook and in peer-reviewed journal articles, the status of data in many non-English languages has yet to be experimentally examined. In this work, we employed formal quantitative methods to evaluate the reliability of judgments in the widely used textbook, The Syntax of Chinese (Huang et al. 2009). We first assessed example sentences based on the acceptability ratings from 148 native Mandarin Chinese speakers. Using a target forced-choice task, we further explored the potentially problematic sentence pairs. Results of the two experiments suggest an eminently successful replication of judgments in the book: out of the 557 data samples tested, only five sentence pairs require further investigation. This large-scale study represents the first attempt to replicate the judgments in a non-English syntax textbook, in hopes to bridge the gap between the informal data-collection in Chinese linguistic research and the protocols of experimental cognitive science.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. It is also possible to obtain informative quantitative data with fewer participants using software tools like the one illustrated in Myers (2009a) or using the Bayesian-framework paradigm proposed by Mahowald et al. (2016).

  2. Sprouse and Almeida (2017) were the first to compare the statistical power of various judgment tasks. A follow-up work by Langsford et al. (2018) estimated how much of the variability within each task is due to psychometric properties, including participant-level individual differences, sample size, response styles, and item effects.

  3. Sprouse et al. (2013) defined “predominantly” as more than 80% of the data points in an article.

  4. While the HLL book was originally published in English, the experimental stimuli were presented in scripts from the book’s simplified Chinese edition (Huang et al. 2013) where spaces segmenting two adjacent words were removed. For all examples in this paper, the page numbers that we refer to are from the book’s English edition.

  5. The experimental materials (and the excluded sentences), data, and code for this manuscript are available at https://osf.io/374h6/.

  6. These quadruples, e.g. a group of four “bad” sentences, can not be divided into two Pair contrasts, nor can they be analyzed using two-way ANOVAs.

  7. It is also possible to address the rating bias issue by testing baseline items along with target sentences in the same rating experiment. Lin (2018), for example, created three baseline groups in his naturalness-rating experiment by manipulating the degree of word-order and grammaticality violation in sentence items.

  8. Seven participants majored in language-related degree programs, such as linguistics, applied linguistics, Chinese literature, or foreign literature.

  9. In this paper, we choose to also report statistical analyses based on the raw data following recommendations by Juzek (2015) and others.

  10. There is a continuing debate on the gradient acceptability issue in forming syntactic theories (Hofmeister and Sag 2010; Lau et al. 2017; Sprouse et al. 2018, among others).

  11. Alternatively, participants can choose from two sentences sampled at random from the set of stimuli. Langsford et al. (2018) compared different acceptability measures, including the Random pairs model (Thurstone 1927) and the Target pair task used in Experiment 2.

  12. We suspect that a number of participants may have misunderstood the instruction of catch trials. 15 participants correctly answered at least one catch trial and only seven missed both. Nonetheless, we excluded the data produced by anyone who had failed even one catch trial.

  13. For most contrasts in the Control 1 group, participants did not choose any “bad” sentence. The calculated z-value was therefore smaller in the Control 1 group than in the Control 2 group, as the mixed-effects model considered results of individual contrasts.

  14. We focus on four contrasts where the predicted directionality of results is reversed. For contrast c8-60, more participants indeed chose the acceptable sentence as the preferred one. Its marginal result in Experiment 2 may be a power issue.

  15. Bare NPs in Chinese are often ambiguous between definite and indefinite readings.

  16. See Fukuda (to appear) for a comprehensive review of studies using acceptability and truth value judgment methods in East Asian languages, including Chinese.

References

  • Adger, David. 2003. Core syntax: A minimalist approach. Oxford: Oxford University Press.

    Google Scholar 

  • Aoun, Joseph, and Yen-hui Audrey Li. 2003. Essays on the representational and derivational nature of grammar: The diversity of wh-constructions. Cambridge: MIT Press.

    Google Scholar 

  • Birdsong, David. 1989. Metalinguistic performance and interlinguistic competence. Dordrecht: Springer.

    Google Scholar 

  • Chaves, Rui P. and Jeruen E. Dery. 2014. Which subject islands will the acceptability of improve with repeated exposure? In Proceedings of the 31st west coast conference on formal linguistics. Cascadilla proceedings project, ed. R. E. Santana La Barge. Somerville, MA.

  • Chen, Zhong, Lena Jäger, and Shravan Vasishth. 2012. How structure-sensitive is the parser? Evidence from Mandarin Chinese. In Empirical approaches to linguistic theory. Studies in generative grammar, ed. B. Stolterfoht, and S. Featherston, 43–62. Berlin: Mouton de Gruyter.

    Google Scholar 

  • Chomsky, Noam. 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press.

    Google Scholar 

  • Chomsky, Noam. 1973. Conditions on transformations. In A Festschrift for Morris Halle, ed. S. Anderson, and P. Kiparsky, 232–286. New York: Holt, Reinhart and Winston.

    Google Scholar 

  • Chomsky, Noam. 1986. Barriers. Cambridge: MIT Press.

    Google Scholar 

  • Cowart, Wayne. 1997. Experimental Syntax: Applying objective methods to sentence judgements. Thousand Oaks, CA: SAGE Publications.

    Google Scholar 

  • den Dikken, Marcel, Judy B. Bernstein, Christina Tortora, and Raffaella Zanuttini. 2007. Data and grammar: Means and individuals. Theoretical Linguistics 33 (3): 335–352.

    Google Scholar 

  • Do, Monica L., and Elsi Kaiser. 2017. The relationship between syntactic satiation and syntactic priming: A first look. Frontiers in Psychology: Language Sciences 8: 1851.

    Google Scholar 

  • Edelman, Shimon, and Morten H. Christiansen. 2003. How seriously should we take minimalist syntax? A comment on Lasnik. Trends in Cognitive Sciences 7 (2): 60–61.

    Google Scholar 

  • Erlewine, Michael Yoshitaka, and Hadas Kotek. 2016. A streamlined approach to online linguistic surveys. Natural Language & Linguistic Theory 34 (2): 481–495.

    Google Scholar 

  • Featherston, Sam. 2005. Magnitude estimation and what it can do for your syntax: Some wh-constraints in German. Lingua 115 (11): 1525–1550.

    Google Scholar 

  • Featherston, Sam. 2007. Data in generative grammar: The stick and the carrot. Theoretical Linguistics 33 (3): 269–318.

    Google Scholar 

  • Ferreira, Fernanda. 2005. Psycholinguistics, formal grammars, and cognitive science. The Linguistic Review 22 (2–4): 365–380.

    Google Scholar 

  • Francom, Jerid Cole. 2009. Experimental Syntax: Exploring the effect of repeated exposure to anomalous syntactic structure–evidence from rating and reading tasks. Ph. D. thesis, University of Arizona, Tucson, AZ.

  • Fukuda, Shin. Acceptability and truth value judgment studies in East Asian languages. In The Cambridge handbook of experimental syntax, ed. G. Goodall. Cambridge: Cambridge University Press (to appear).

  • Gibson, Edward, and Evelina Fedorenko. 2010. Weak quantitative standards in linguistics research. Trends in Cognitive Sciences 14 (6): 233–234.

    Google Scholar 

  • Gibson, Edward, and Evelina Fedorenko. 2013. The need for quantitative methods in syntax and semantics research. Language and Cognitive Processes 28 (1–2): 88–124.

    Google Scholar 

  • Gibson, Edward, Steven T. Piantadosi, and Evelina Fedorenko. 2013. Quantitative methods in syntax/semantics research: A response to Sprouse and Almeida (2013). Language and Cognitive Processes 28 (3): 229–240.

    Google Scholar 

  • Gong, Tao, Lan Shuai, and Wu Yicheng. 2019. The acceptability judgment of Chinese pseudo-modifiers with and without a sentential context. PLOS ONE 14 (7): e0219896.

    Google Scholar 

  • Goodall, Grant. 2011. Syntactic satiation and the inversion effect in English and Spanish wh-questions. Syntax 14 (1): 29–47.

    Google Scholar 

  • Hartley, James. 2014. Some thoughts on Likert-type scales. International Journal of Clinical and Health Psychology 14 (1): 83–86.

    Google Scholar 

  • Hiramatsu, Kazuko. 2000. Accessing linguistic competence: Evidence from children’s and adults’ acceptability judgments. Ph. D. thesis, University of Connecticut.

  • Hofmeister, Philip, and Ivan A. Sag. 2010. Cognitive constraints and island effects. Language 86 (2): 366–415.

    Google Scholar 

  • Huang, Cheng-Teh James. 1982. Logical relations in Chinese and the theory of grammar. Ph. D. thesis, MIT.

  • Huang, Cheng-Teh James, Yen-Hui Audrey Li, and Yafei Li. 2009. The syntax of Chinese. Cambridge: Cambridge University Press.

    Google Scholar 

  • Huang, Cheng-Teh James, Yen-Hui Audrey Li, and Yafei Li. 2013. Hanyu Jufa Xue [The Syntax of Chinese] (Simplified, Chinese ed.; Yang Gu, Ed. and Heyou Zhang, Trans.). Beijing, China: World Publishing Corporation.

  • Juzek, Thomas. 2015. Acceptability judgement tasks and grammatical theory. Ph. D. thesis, University of Oxford.

  • Juzek, Thomas, and Jana Häussler. Data convergence in syntactic theory and the role of sentence pairs. Zeitschrift fü Sprachwissenschaft (to appear).

  • Keller, Frank. 2000. Gradience in grammar: Experimental and computational aspects of degrees of grammaticality. Ph. D. thesis, University of Edinburgh.

  • Khoo, Yong Kang, and Jingxia Lin. 2018. Grammatical variations between Singapore, Mainland China, and Taiwan Mandarin: A pilot study of aspect marking. In Proceedings of the 32nd Pacific Asia conference on language, information and computation.

  • Labov, William. 1978. Sociolinguistics. In A survey of linguistic science, ed. W.O. Dingwall, 339–72. Stamford, CT: Greylock.

    Google Scholar 

  • Langendoen, D.Terence, Nancy Kalish-Landon, and John Dore. 1973. Dative questions: A study in the relation of acceptability to grammaticality of an English sentence type. Cognition 2 (4): 451–478.

    Google Scholar 

  • Langsford, Steven, Amy Perfors, Andrew T. Hendrickson, Lauren A. Kennedy, and Danielle J. Navarro. 2018. Quantifying sentence acceptability measures: Reliability, bias, and variability. Glossa: A Journal of General Linguistics 3 (1): 1–34.

    Google Scholar 

  • Lau, Jey Han, Alexander Clark, and Shalom Lappin. 2017. Grammaticality, acceptability, and probability: A probabilistic view of linguistic knowledge. Cognitive Science 41 (5): 1202–1241.

    Google Scholar 

  • Laws, Jacqueline, and Boping Yuan. 2010. Is the core-peripheral distinction for unaccusative verbs cross-linguistically consistent?: Empirical evidence from Mandarin. Chinese Language and Discourse 1 (2): 220–263.

    Google Scholar 

  • Levelt, Willem J.M., J.A.W.M. van Gent, A.F.J. Haans, and A.J.A. Meijers. 1977. Grammaticality, paraphrase, and imagery. In Acceptability in language, ed. S. Greenbaum, 87–101. The Hague: Mouton.

    Google Scholar 

  • Likert, Rensis. 1932. A technique for the measurement of attitudes. Archives of Psychology 140: 44–60.

    Google Scholar 

  • Lin, Chien-Jer Charles. 2012. Distinguishing grammatical and processing explanations of syntactic acceptability. In In search of grammar: Experimental and corpus-based studies. Language and linguistics monograph series, vol. 48, ed. J. Myers. Taipei: Academia Sinica.

    Google Scholar 

  • Lin, Chien-Jer Charles. 2018. Subject prominence and processing dependencies in prenominal relative clauses: The comprehension of possessive relative clauses and adjunct relative clauses in Mandarin Chinese. Language 94 (4): 758–797.

    Google Scholar 

  • Linzen, Tal, and Yohei Oseki. 2018. The reliability of acceptability judgments across languages. Glossa: A Journal of General Linguistics 3(1) (100): 1–25.

    Google Scholar 

  • Lu, Jiayi, Cynthia K. Thompson, and Masaya Yoshida. 2020. Chinese wh-in-situ and islands: A formal judgment study. Linguistic Inquiry 51 (3).

  • Mahowald, Kyle, Peter Graff, Jeremy Hartman, and Edward Gibson. 2016. Snap judgments: A small N acceptability paradigm (SNAP) for linguistic acceptability judgments. Language 92 (3): 619–635.

    Google Scholar 

  • Marantz, Alec. 2005. Generative linguistics within the cognitive neuroscience of language. The Linguistic Review 22 (2–4): 429–445.

    Google Scholar 

  • Munro, Robert, Steven Bethard, Victor Kuperman, Vicky Tzuyin Lai, Robin Melnick, Christopher Potts, Tyler Schnoebelen, and Harry Tily. 2010. Crowdsourcing and language studies: The new generation of linguistic data. In Proceedings of the NAACL-HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk, 122–130. Association for Computational Linguistics.

  • Myers, James. 2007. MiniJudge: Software for small-scale experimental syntax. International Journal of Computational Linguistics & Chinese Language Processing 12: 175–194.

    Google Scholar 

  • Myers, James. 2009a. The design and analysis of small-scale syntactic judgment experiments. Lingua 119 (3): 425–444.

    Google Scholar 

  • Myers, James. 2009b. Syntactic judgment experiments. Language and Linguistics Compass 3 (1): 406–423.

    Google Scholar 

  • Myers, James. 2012. Testing adjunct and conjunct island constraints in Chinese. Language and Linguistics 13 (3): 437.

    Google Scholar 

  • Newmeyer, Frederick J. 1983. Grammatical theory: Its limits and its possibilities. Chicago: University of Chicago Press.

    Google Scholar 

  • Newmeyer, Frederick J. 2013. Goals and methods of generative syntax. In The Cambridge handbook of generative syntax, ed. M. den Dikken, 61–92. Cambridge: Cambridge University Press.

    Google Scholar 

  • Ou, Tzu-Shan. 2006. Suo relative clauses in Mandarin Chinese. Master’s thesis, National Chung Cheng University, Taiwan.

  • Phillips, Colin. 2009. Should we impeach armchair linguists? In Japanese/Korean Linguistics, vol. 17, ed. S. Iwasaki, 49–64. Stanford: CSLI Publications.

    Google Scholar 

  • Phillips, Collin, and Howard Lasnik. 2003. Linguistics and empirical evidence: Reply to Edelman and Christiansen. Trends in Cognitive Sciences 7 (2): 61–62.

    Google Scholar 

  • Rosenbach, Anette. 2003. Aspects of iconicity and economy in the choice between the s-genitive and the of-genitive in English. In Determinants of grammatical variation in English. Volume 43 of topics in English linguistics [TiEL], ed. G. Rohdenburg, and B. Mondorf, 379–412. Berlin: De Gruyter Mouton.

    Google Scholar 

  • Schütze, Carson. 2020. Acceptability ratings cannot be taken at face value. In Linguistic intuitions, ed. S. Schindler, A. Drozdzowicz, and K. Brøcker. Oxford: Oxford University Press.

  • Schütze, Carson. 1996. The empirical base of linguistics: Grammaticality judgments and linguistic methodology. Chicago: University of Chicago Press.

    Google Scholar 

  • Schütze, Carson, and Jon Sprouse. 2014. Judgment data. In Research methods in linguistics, ed. R.J. Podesva, and D. Sharma, 27–50. Cambridge: Cambridge University Press.

    Google Scholar 

  • Scontras, Gregory, Maria Polinsky, Cheng-Yu Edwin Tsai, and Kenneth Mai. 2017. Cross-linguistic scope ambiguity: When two systems meet. Glossa: A Journal of General Linguistics 2 (1): 1–28.

    Google Scholar 

  • Scontras, Gregory, Cheng-Yu Edwin Tsai, Kenneth Mai, and Maria Polinsky. 2014. Chinese scope: An experimental investigation. Proceedings of Sinn und Bedeutung 18: 396–414.

    Google Scholar 

  • Shi, Dingxu. 1994. The nature of Chinese wh-questions. Natural Language & Linguistic Theory 12 (2): 301–333.

    Google Scholar 

  • Snyder, William. 2000. An experimental investigation of syntactic satiation effects. Linguistic Inquiry 31 (3): 575–582.

    Google Scholar 

  • Song, Sanghoun, Jae-Woong Choe, and Oh Eunjeong. 2014. FAQ: Do non-linguists share the same intuition as linguists? Language Research 50 (2): 357–386.

    Google Scholar 

  • Sprouse, Jon. 2009. Revisiting satiation: Evidence for an equalization response strategy. Linguistic Inquiry 40 (2): 329–341.

    Google Scholar 

  • Sprouse, Jon. 2011. A validation of Amazon Mechanical Turk for the collection of acceptability judgments in linguistic theory. Behavior Research Methods 43 (1): 155–167.

    Google Scholar 

  • Sprouse, Jon, and Diogo Almeida. 2012. Assessing the reliability of textbook data in syntax: Adger’s Core Syntax. Journal of Linguistics 48 (3): 609–652.

    Google Scholar 

  • Sprouse, Jon, and Diogo Almeida. 2013. The empirical status of data in syntax: A reply to Gibson and Fedorenko. Language and Cognitive Processes 28 (3): 222–228.

    Google Scholar 

  • Sprouse, Jon, and Diogo Almeida. 2017. Design sensitivity and statistical power in acceptability judgment experiments. Glossa: A Journal of General Linguistics 2 (1): 1–32.

    Google Scholar 

  • Sprouse, Jon, Carson Schütze, and Diogo Almeida. 2013. Assessing the reliability of journal data in syntax: Linguistic inquiry 2001–2010. Lingua 134: 219–248.

    Google Scholar 

  • Sprouse, Jon, Matt Wagers, and Colin Phillips. 2012. Working-memory capacity and island effects: A reminder of the issues and the facts. Language 88 (2): 401–407.

    Google Scholar 

  • Sprouse, Jon, Beracah Yankama, Sagar Indurkhya, Sandiway Fong, and Robert C. Berwick. 2018. Colorless green ideas do sleep furiously: Gradient acceptability and the nature of the grammar. The Linguistic Review 35 (3): 575–599.

    Google Scholar 

  • Thurstone, Louis L. 1927. A law of comparative judgment. Psychological Review 34 (4): 273.

    Google Scholar 

  • Wang, Shichang, Chu-Ren Huang, Yao Yao, and Angel Chan. 2015. Mechanical turk-based experiment vs laboratory-based experiment: A case study on the comparison of semantic transparency rating data. In Proceedings of the 29th Pacific Asia conference on language, information and computation, pp. 53–62.

  • Xu, Liejiong. 1990. Remarks on LF movement in Chinese questions. Linguistics 28 (2): 355–383.

    Google Scholar 

  • Xu, Liejiong. 1996. Construction and destruction of theories by data: A case study. Chicago Linguistics Society 32: 107–118.

    Google Scholar 

  • Yao, Yao, Zhiguo Xie, Chien-Jer Charles Lin, and Chu-Ren Huang. Acceptability or grammaticality: Judging Chinese sentences for linguistic studies. In Cambridge handbook of Chinese linguistics. Cambridge: Cambridge University Press (to appear).

  • Zhou, Peng, and Liqun Gao. 2009. Scope processing in Chinese. Journal of Psycholinguistic Research 38: 11–24.

    Google Scholar 

Download references

Acknowledgements

This work was supported by a faculty research grant to Zhong Chen from the College of Liberal Arts at Rochester Institute of Technology. We are grateful to Jeff Runner for discussions at various stages of this project. We thank Tian Tian for her assistance in preparing experimental materials, Qingrong Chen, Qiongpeng Luo and Zhuang Wu for helping with recruiting participants, as well as Jacquelyn Haller for editorial suggestions. We would also like to thank the anonymous reviewers and the editors of this journal for their insightful comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhong Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

See Figs. 8, 9 and Tables 5, 6.

Fig. 8
figure 8

Analyzing the raw data of Experiment 1 suggests that the rating difference was not significant between the two conditions in 9 Pair contrasts, in addition to those in Fig. 5. Error bars represent the 95% CI

Fig. 9
figure 9

The proportional difference of choosing a “good” or “bad” sentence was significant in all but one contrast in the Control 2 group of Experiment 2

Table 5 The acceptability rating differences between the two members were not significant in 17 Pair contrasts (\(t>2\)) when the z-transformed data were analyzed in Experiment 1. Re-testing them in Experiment 2 using forced-choice suggests that 5 contrasts, highlighted in red, again failed to reach significance statistically (\(z>2\))
Table 6 In Experiment 2, the proportional differences of choice in all but one contrast in the Control 2 group reached significance

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Xu, Y. & Xie, Z. Assessing introspective linguistic judgments quantitatively: the case of The Syntax of Chinese. J East Asian Linguist 29, 311–336 (2020). https://doi.org/10.1007/s10831-020-09210-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10831-020-09210-y

Keywords

Navigation