Assessing introspective linguistic judgments quantitatively: the case of The Syntax of Chinese

Chen, Zhong; Xu, Yuhang; Xie, Zhiguo

doi:10.1007/s10831-020-09210-y

Assessing introspective linguistic judgments quantitatively: the case of The Syntax of Chinese

Published: 21 July 2020

Volume 29, pages 311–336, (2020)
Cite this article

Journal of East Asian Linguistics Aims and scope Submit manuscript

Zhong Chen¹,
Yuhang Xu² &
Zhiguo Xie³

1054 Accesses
8 Citations
6 Altmetric
Explore all metrics

Abstract

The informal judgments of the well-formedness of phrases and sentences have long been used as the primary data source for syntacticians. In recent years, the reliability of data based on linguists’ introspective intuitions is increasingly subject to scrutiny. Although a number of studies were able to replicate a vast majority of English judgments published in a textbook and in peer-reviewed journal articles, the status of data in many non-English languages has yet to be experimentally examined. In this work, we employed formal quantitative methods to evaluate the reliability of judgments in the widely used textbook, The Syntax of Chinese (Huang et al. 2009). We first assessed example sentences based on the acceptability ratings from 148 native Mandarin Chinese speakers. Using a target forced-choice task, we further explored the potentially problematic sentence pairs. Results of the two experiments suggest an eminently successful replication of judgments in the book: out of the 557 data samples tested, only five sentence pairs require further investigation. This large-scale study represents the first attempt to replicate the judgments in a non-English syntax textbook, in hopes to bridge the gap between the informal data-collection in Chinese linguistic research and the protocols of experimental cognitive science.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Embodied human language models vs. Large Language Models, or why Artificial Intelligence cannot explain the modal be able to

Article 07 February 2024

Natural language syntax complies with the free-energy principle

Article Open access 03 May 2024

Recent Developments in RFT Encourage Interbehavioral Field-Based Views of Human Language and Cognition: A Preliminary Analysis

Article Open access 09 May 2024

Notes

It is also possible to obtain informative quantitative data with fewer participants using software tools like the one illustrated in Myers (2009a) or using the Bayesian-framework paradigm proposed by Mahowald et al. (2016).
Sprouse and Almeida (2017) were the first to compare the statistical power of various judgment tasks. A follow-up work by Langsford et al. (2018) estimated how much of the variability within each task is due to psychometric properties, including participant-level individual differences, sample size, response styles, and item effects.
Sprouse et al. (2013) defined “predominantly” as more than 80% of the data points in an article.
While the HLL book was originally published in English, the experimental stimuli were presented in scripts from the book’s simplified Chinese edition (Huang et al. 2013) where spaces segmenting two adjacent words were removed. For all examples in this paper, the page numbers that we refer to are from the book’s English edition.
The experimental materials (and the excluded sentences), data, and code for this manuscript are available at https://osf.io/374h6/.
These quadruples, e.g. a group of four “bad” sentences, can not be divided into two Pair contrasts, nor can they be analyzed using two-way ANOVAs.
It is also possible to address the rating bias issue by testing baseline items along with target sentences in the same rating experiment. Lin (2018), for example, created three baseline groups in his naturalness-rating experiment by manipulating the degree of word-order and grammaticality violation in sentence items.
Seven participants majored in language-related degree programs, such as linguistics, applied linguistics, Chinese literature, or foreign literature.
In this paper, we choose to also report statistical analyses based on the raw data following recommendations by Juzek (2015) and others.
There is a continuing debate on the gradient acceptability issue in forming syntactic theories (Hofmeister and Sag 2010; Lau et al. 2017; Sprouse et al. 2018, among others).
Alternatively, participants can choose from two sentences sampled at random from the set of stimuli. Langsford et al. (2018) compared different acceptability measures, including the Random pairs model (Thurstone 1927) and the Target pair task used in Experiment 2.
We suspect that a number of participants may have misunderstood the instruction of catch trials. 15 participants correctly answered at least one catch trial and only seven missed both. Nonetheless, we excluded the data produced by anyone who had failed even one catch trial.
For most contrasts in the Control 1 group, participants did not choose any “bad” sentence. The calculated z-value was therefore smaller in the Control 1 group than in the Control 2 group, as the mixed-effects model considered results of individual contrasts.
We focus on four contrasts where the predicted directionality of results is reversed. For contrast c8-60, more participants indeed chose the acceptable sentence as the preferred one. Its marginal result in Experiment 2 may be a power issue.
Bare NPs in Chinese are often ambiguous between definite and indefinite readings.
See Fukuda (to appear) for a comprehensive review of studies using acceptability and truth value judgment methods in East Asian languages, including Chinese.

References

Adger, David. 2003. Core syntax: A minimalist approach. Oxford: Oxford University Press.
Google Scholar
Aoun, Joseph, and Yen-hui Audrey Li. 2003. Essays on the representational and derivational nature of grammar: The diversity of wh-constructions. Cambridge: MIT Press.
Google Scholar
Birdsong, David. 1989. Metalinguistic performance and interlinguistic competence. Dordrecht: Springer.
Google Scholar
Chaves, Rui P. and Jeruen E. Dery. 2014. Which subject islands will the acceptability of improve with repeated exposure? In Proceedings of the 31st west coast conference on formal linguistics. Cascadilla proceedings project, ed. R. E. Santana La Barge. Somerville, MA.
Chen, Zhong, Lena Jäger, and Shravan Vasishth. 2012. How structure-sensitive is the parser? Evidence from Mandarin Chinese. In Empirical approaches to linguistic theory. Studies in generative grammar, ed. B. Stolterfoht, and S. Featherston, 43–62. Berlin: Mouton de Gruyter.
Google Scholar
Chomsky, Noam. 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Google Scholar
Chomsky, Noam. 1973. Conditions on transformations. In A Festschrift for Morris Halle, ed. S. Anderson, and P. Kiparsky, 232–286. New York: Holt, Reinhart and Winston.
Google Scholar
Chomsky, Noam. 1986. Barriers. Cambridge: MIT Press.
Google Scholar
Cowart, Wayne. 1997. Experimental Syntax: Applying objective methods to sentence judgements. Thousand Oaks, CA: SAGE Publications.
Google Scholar
den Dikken, Marcel, Judy B. Bernstein, Christina Tortora, and Raffaella Zanuttini. 2007. Data and grammar: Means and individuals. Theoretical Linguistics 33 (3): 335–352.
Google Scholar
Do, Monica L., and Elsi Kaiser. 2017. The relationship between syntactic satiation and syntactic priming: A first look. Frontiers in Psychology: Language Sciences 8: 1851.
Google Scholar
Edelman, Shimon, and Morten H. Christiansen. 2003. How seriously should we take minimalist syntax? A comment on Lasnik. Trends in Cognitive Sciences 7 (2): 60–61.
Google Scholar
Erlewine, Michael Yoshitaka, and Hadas Kotek. 2016. A streamlined approach to online linguistic surveys. Natural Language & Linguistic Theory 34 (2): 481–495.
Google Scholar
Featherston, Sam. 2005. Magnitude estimation and what it can do for your syntax: Some wh-constraints in German. Lingua 115 (11): 1525–1550.
Google Scholar
Featherston, Sam. 2007. Data in generative grammar: The stick and the carrot. Theoretical Linguistics 33 (3): 269–318.
Google Scholar
Ferreira, Fernanda. 2005. Psycholinguistics, formal grammars, and cognitive science. The Linguistic Review 22 (2–4): 365–380.
Google Scholar
Francom, Jerid Cole. 2009. Experimental Syntax: Exploring the effect of repeated exposure to anomalous syntactic structure–evidence from rating and reading tasks. Ph. D. thesis, University of Arizona, Tucson, AZ.
Fukuda, Shin. Acceptability and truth value judgment studies in East Asian languages. In The Cambridge handbook of experimental syntax, ed. G. Goodall. Cambridge: Cambridge University Press (to appear).
Gibson, Edward, and Evelina Fedorenko. 2010. Weak quantitative standards in linguistics research. Trends in Cognitive Sciences 14 (6): 233–234.
Google Scholar
Gibson, Edward, and Evelina Fedorenko. 2013. The need for quantitative methods in syntax and semantics research. Language and Cognitive Processes 28 (1–2): 88–124.
Google Scholar
Gibson, Edward, Steven T. Piantadosi, and Evelina Fedorenko. 2013. Quantitative methods in syntax/semantics research: A response to Sprouse and Almeida (2013). Language and Cognitive Processes 28 (3): 229–240.
Google Scholar
Gong, Tao, Lan Shuai, and Wu Yicheng. 2019. The acceptability judgment of Chinese pseudo-modifiers with and without a sentential context. PLOS ONE 14 (7): e0219896.
Google Scholar
Goodall, Grant. 2011. Syntactic satiation and the inversion effect in English and Spanish wh-questions. Syntax 14 (1): 29–47.
Google Scholar
Hartley, James. 2014. Some thoughts on Likert-type scales. International Journal of Clinical and Health Psychology 14 (1): 83–86.
Google Scholar
Hiramatsu, Kazuko. 2000. Accessing linguistic competence: Evidence from children’s and adults’ acceptability judgments. Ph. D. thesis, University of Connecticut.
Hofmeister, Philip, and Ivan A. Sag. 2010. Cognitive constraints and island effects. Language 86 (2): 366–415.
Google Scholar
Huang, Cheng-Teh James. 1982. Logical relations in Chinese and the theory of grammar. Ph. D. thesis, MIT.
Huang, Cheng-Teh James, Yen-Hui Audrey Li, and Yafei Li. 2009. The syntax of Chinese. Cambridge: Cambridge University Press.
Google Scholar
Huang, Cheng-Teh James, Yen-Hui Audrey Li, and Yafei Li. 2013. Hanyu Jufa Xue [The Syntax of Chinese] (Simplified, Chinese ed.; Yang Gu, Ed. and Heyou Zhang, Trans.). Beijing, China: World Publishing Corporation.
Juzek, Thomas. 2015. Acceptability judgement tasks and grammatical theory. Ph. D. thesis, University of Oxford.
Juzek, Thomas, and Jana Häussler. Data convergence in syntactic theory and the role of sentence pairs. Zeitschrift fü Sprachwissenschaft (to appear).
Keller, Frank. 2000. Gradience in grammar: Experimental and computational aspects of degrees of grammaticality. Ph. D. thesis, University of Edinburgh.
Khoo, Yong Kang, and Jingxia Lin. 2018. Grammatical variations between Singapore, Mainland China, and Taiwan Mandarin: A pilot study of aspect marking. In Proceedings of the 32nd Pacific Asia conference on language, information and computation.
Labov, William. 1978. Sociolinguistics. In A survey of linguistic science, ed. W.O. Dingwall, 339–72. Stamford, CT: Greylock.
Google Scholar
Langendoen, D.Terence, Nancy Kalish-Landon, and John Dore. 1973. Dative questions: A study in the relation of acceptability to grammaticality of an English sentence type. Cognition 2 (4): 451–478.
Google Scholar
Langsford, Steven, Amy Perfors, Andrew T. Hendrickson, Lauren A. Kennedy, and Danielle J. Navarro. 2018. Quantifying sentence acceptability measures: Reliability, bias, and variability. Glossa: A Journal of General Linguistics 3 (1): 1–34.
Google Scholar
Lau, Jey Han, Alexander Clark, and Shalom Lappin. 2017. Grammaticality, acceptability, and probability: A probabilistic view of linguistic knowledge. Cognitive Science 41 (5): 1202–1241.
Google Scholar
Laws, Jacqueline, and Boping Yuan. 2010. Is the core-peripheral distinction for unaccusative verbs cross-linguistically consistent?: Empirical evidence from Mandarin. Chinese Language and Discourse 1 (2): 220–263.
Google Scholar
Levelt, Willem J.M., J.A.W.M. van Gent, A.F.J. Haans, and A.J.A. Meijers. 1977. Grammaticality, paraphrase, and imagery. In Acceptability in language, ed. S. Greenbaum, 87–101. The Hague: Mouton.
Google Scholar
Likert, Rensis. 1932. A technique for the measurement of attitudes. Archives of Psychology 140: 44–60.
Google Scholar
Lin, Chien-Jer Charles. 2012. Distinguishing grammatical and processing explanations of syntactic acceptability. In In search of grammar: Experimental and corpus-based studies. Language and linguistics monograph series, vol. 48, ed. J. Myers. Taipei: Academia Sinica.
Google Scholar
Lin, Chien-Jer Charles. 2018. Subject prominence and processing dependencies in prenominal relative clauses: The comprehension of possessive relative clauses and adjunct relative clauses in Mandarin Chinese. Language 94 (4): 758–797.
Google Scholar
Linzen, Tal, and Yohei Oseki. 2018. The reliability of acceptability judgments across languages. Glossa: A Journal of General Linguistics 3(1) (100): 1–25.
Google Scholar
Lu, Jiayi, Cynthia K. Thompson, and Masaya Yoshida. 2020. Chinese wh-in-situ and islands: A formal judgment study. Linguistic Inquiry 51 (3).
Mahowald, Kyle, Peter Graff, Jeremy Hartman, and Edward Gibson. 2016. Snap judgments: A small N acceptability paradigm (SNAP) for linguistic acceptability judgments. Language 92 (3): 619–635.
Google Scholar
Marantz, Alec. 2005. Generative linguistics within the cognitive neuroscience of language. The Linguistic Review 22 (2–4): 429–445.
Google Scholar
Munro, Robert, Steven Bethard, Victor Kuperman, Vicky Tzuyin Lai, Robin Melnick, Christopher Potts, Tyler Schnoebelen, and Harry Tily. 2010. Crowdsourcing and language studies: The new generation of linguistic data. In Proceedings of the NAACL-HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk, 122–130. Association for Computational Linguistics.
Myers, James. 2007. MiniJudge: Software for small-scale experimental syntax. International Journal of Computational Linguistics & Chinese Language Processing 12: 175–194.
Google Scholar
Myers, James. 2009a. The design and analysis of small-scale syntactic judgment experiments. Lingua 119 (3): 425–444.
Google Scholar
Myers, James. 2009b. Syntactic judgment experiments. Language and Linguistics Compass 3 (1): 406–423.
Google Scholar
Myers, James. 2012. Testing adjunct and conjunct island constraints in Chinese. Language and Linguistics 13 (3): 437.
Google Scholar
Newmeyer, Frederick J. 1983. Grammatical theory: Its limits and its possibilities. Chicago: University of Chicago Press.
Google Scholar
Newmeyer, Frederick J. 2013. Goals and methods of generative syntax. In The Cambridge handbook of generative syntax, ed. M. den Dikken, 61–92. Cambridge: Cambridge University Press.
Google Scholar
Ou, Tzu-Shan. 2006. Suo relative clauses in Mandarin Chinese. Master’s thesis, National Chung Cheng University, Taiwan.
Phillips, Colin. 2009. Should we impeach armchair linguists? In Japanese/Korean Linguistics, vol. 17, ed. S. Iwasaki, 49–64. Stanford: CSLI Publications.
Google Scholar
Phillips, Collin, and Howard Lasnik. 2003. Linguistics and empirical evidence: Reply to Edelman and Christiansen. Trends in Cognitive Sciences 7 (2): 61–62.
Google Scholar
Rosenbach, Anette. 2003. Aspects of iconicity and economy in the choice between the s-genitive and the of-genitive in English. In Determinants of grammatical variation in English. Volume 43 of topics in English linguistics [TiEL], ed. G. Rohdenburg, and B. Mondorf, 379–412. Berlin: De Gruyter Mouton.
Google Scholar
Schütze, Carson. 2020. Acceptability ratings cannot be taken at face value. In Linguistic intuitions, ed. S. Schindler, A. Drozdzowicz, and K. Brøcker. Oxford: Oxford University Press.
Schütze, Carson. 1996. The empirical base of linguistics: Grammaticality judgments and linguistic methodology. Chicago: University of Chicago Press.
Google Scholar
Schütze, Carson, and Jon Sprouse. 2014. Judgment data. In Research methods in linguistics, ed. R.J. Podesva, and D. Sharma, 27–50. Cambridge: Cambridge University Press.
Google Scholar
Scontras, Gregory, Maria Polinsky, Cheng-Yu Edwin Tsai, and Kenneth Mai. 2017. Cross-linguistic scope ambiguity: When two systems meet. Glossa: A Journal of General Linguistics 2 (1): 1–28.
Google Scholar
Scontras, Gregory, Cheng-Yu Edwin Tsai, Kenneth Mai, and Maria Polinsky. 2014. Chinese scope: An experimental investigation. Proceedings of Sinn und Bedeutung 18: 396–414.
Google Scholar
Shi, Dingxu. 1994. The nature of Chinese wh-questions. Natural Language & Linguistic Theory 12 (2): 301–333.
Google Scholar
Snyder, William. 2000. An experimental investigation of syntactic satiation effects. Linguistic Inquiry 31 (3): 575–582.
Google Scholar
Song, Sanghoun, Jae-Woong Choe, and Oh Eunjeong. 2014. FAQ: Do non-linguists share the same intuition as linguists? Language Research 50 (2): 357–386.
Google Scholar
Sprouse, Jon. 2009. Revisiting satiation: Evidence for an equalization response strategy. Linguistic Inquiry 40 (2): 329–341.
Google Scholar
Sprouse, Jon. 2011. A validation of Amazon Mechanical Turk for the collection of acceptability judgments in linguistic theory. Behavior Research Methods 43 (1): 155–167.
Google Scholar
Sprouse, Jon, and Diogo Almeida. 2012. Assessing the reliability of textbook data in syntax: Adger’s Core Syntax. Journal of Linguistics 48 (3): 609–652.
Google Scholar
Sprouse, Jon, and Diogo Almeida. 2013. The empirical status of data in syntax: A reply to Gibson and Fedorenko. Language and Cognitive Processes 28 (3): 222–228.
Google Scholar
Sprouse, Jon, and Diogo Almeida. 2017. Design sensitivity and statistical power in acceptability judgment experiments. Glossa: A Journal of General Linguistics 2 (1): 1–32.
Google Scholar
Sprouse, Jon, Carson Schütze, and Diogo Almeida. 2013. Assessing the reliability of journal data in syntax: Linguistic inquiry 2001–2010. Lingua 134: 219–248.
Google Scholar
Sprouse, Jon, Matt Wagers, and Colin Phillips. 2012. Working-memory capacity and island effects: A reminder of the issues and the facts. Language 88 (2): 401–407.
Google Scholar
Sprouse, Jon, Beracah Yankama, Sagar Indurkhya, Sandiway Fong, and Robert C. Berwick. 2018. Colorless green ideas do sleep furiously: Gradient acceptability and the nature of the grammar. The Linguistic Review 35 (3): 575–599.
Google Scholar
Thurstone, Louis L. 1927. A law of comparative judgment. Psychological Review 34 (4): 273.
Google Scholar
Wang, Shichang, Chu-Ren Huang, Yao Yao, and Angel Chan. 2015. Mechanical turk-based experiment vs laboratory-based experiment: A case study on the comparison of semantic transparency rating data. In Proceedings of the 29th Pacific Asia conference on language, information and computation, pp. 53–62.
Xu, Liejiong. 1990. Remarks on LF movement in Chinese questions. Linguistics 28 (2): 355–383.
Google Scholar
Xu, Liejiong. 1996. Construction and destruction of theories by data: A case study. Chicago Linguistics Society 32: 107–118.
Google Scholar
Yao, Yao, Zhiguo Xie, Chien-Jer Charles Lin, and Chu-Ren Huang. Acceptability or grammaticality: Judging Chinese sentences for linguistic studies. In Cambridge handbook of Chinese linguistics. Cambridge: Cambridge University Press (to appear).
Zhou, Peng, and Liqun Gao. 2009. Scope processing in Chinese. Journal of Psycholinguistic Research 38: 11–24.
Google Scholar

Download references

Acknowledgements

This work was supported by a faculty research grant to Zhong Chen from the College of Liberal Arts at Rochester Institute of Technology. We are grateful to Jeff Runner for discussions at various stages of this project. We thank Tian Tian for her assistance in preparing experimental materials, Qingrong Chen, Qiongpeng Luo and Zhuang Wu for helping with recruiting participants, as well as Jacquelyn Haller for editorial suggestions. We would also like to thank the anonymous reviewers and the editors of this journal for their insightful comments and suggestions.

Author information

Authors and Affiliations

Department of Modern Languages and Cultures, Rochester Institute of Technology, Rochester, USA
Zhong Chen
Department of Linguistics, University of Rochester, Rochester, USA
Yuhang Xu
Department of East Asian Languages and Literatures, The Ohio State University, Columbus, USA
Zhiguo Xie

Authors

Zhong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yuhang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Zhiguo Xie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhong Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

See Figs. 8, 9 and Tables 5, 6.

Table 5 The acceptability rating differences between the two members were not significant in 17 Pair contrasts (\(t>2\)) when the z-transformed data were analyzed in Experiment 1. Re-testing them in Experiment 2 using forced-choice suggests that 5 contrasts, highlighted in red, again failed to reach significance statistically (\(z>2\))

Full size table

Table 6 In Experiment 2, the proportional differences of choice in all but one contrast in the Control 2 group reached significance

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Z., Xu, Y. & Xie, Z. Assessing introspective linguistic judgments quantitatively: the case of The Syntax of Chinese. J East Asian Linguist 29, 311–336 (2020). https://doi.org/10.1007/s10831-020-09210-y

Download citation

Received: 04 October 2019
Accepted: 22 April 2020
Published: 21 July 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s10831-020-09210-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Assessing introspective linguistic judgments quantitatively: the case of The Syntax of Chinese

Abstract

Access this article

Similar content being viewed by others

Embodied human language models vs. Large Language Models, or why Artificial Intelligence cannot explain the modal be able to

Natural language syntax complies with the free-energy principle

Recent Developments in RFT Encourage Interbehavioral Field-Based Views of Human Language and Cognition: A Preliminary Analysis

Notes

References

Acknowledgements