Abstract
The Gsearch system allows the selection of sentences by syntacticcriteria from text corpora, even when these corpora contain no priorsyntactic markup. This is achieved by means of a fast chart parser,which takes as input a grammar and a search expression specified by theuser. Gsearch features a modular architecture that can be extendedstraightforwardly to give access to new corpora. The Gsearcharchitecture also allows interfacing with external linguistic resources(such as taggers and lexical databases). Gsearch can be used withgraphical tools for visualizing the results of a query.
Similar content being viewed by others
References
Aho, A. V. and J. D. Ullman. The Theory of Parsing, Tanslation and Compiling. Englewood Cliffs, NJ: Prentice-Hall, 1972.
Berber Sardinha, A. P. “Review: WordSmith Tools”. Computers & Texts 12 (1996), 19–21.
Brants, T. “TnT: A Statistical Part-of-Speech Tagger. Installation and User's Guide”. Department of Computational Linguistics, Saarland University, 1998.
Burnard, L. “Users Guide for the British National Corpus”. British National Corpus Consortium, Oxford University Computing Service, 1995.
Calder, J. “How to Build a (Quite General) Linguistic Diagram Editor”. In: P. Olivier (ed.): Proceedings of the 2nd Workshop on Thinking with Diagrams. Aberystwyth, 1998a, pp. 71–78.
Calder, J. “Thistle: Diagram Display Engines and Editors”. Technical Report HCRC/TR-97, Human Communication Research Centre, University of Edinburgh, 1998b.
Christ, O. “A Modular and Flexible Architecture for an Integrated Corpus Query System” In: Proceedings of the 3rd Conference on Computational Lexicography and Text Research. Budapest, 1994, pp. 23–32.
Corley, M., S. Corley, M. Crocker, F. Keller and S. Trewin. “Gsearch User Manual”. Human Communication Research Centre, University of Edinburgh, 1999.
Corley, M. and M. Cuthbert. “Indiviual Differences in Modifier Attachments: Experience-Based Factors”. Paper presented at the 3rd Conference on Architectures and Mechanisms for Language Processing, Edinburgh, 1997.
Corley, M. and S. Haywood. “Parsing Modifiers: The Case of Bare-NP Adverbs”. In: Proceedings of the 21st Annual Conference of the Cognitive Science Society. Vancouver, 1999, pp. 126–131.
Earley, J. “An Efficient Context-Free Parsing Algorithm”. Communications of the ACM 13(2) (1970),94–102.
Francis, W. N., H. Kučera and A. W. Mackie. Frequency Analysis of English Usage: Lexicon and Grammar. Boston: Houghton Mifflin, 1982.
Greenbaum, S. (ed.) Comparing English Worldwide: The International Corpus of English. Oxford: Clarendon Press, 1996.
Hajič, J. “Building a Syntactically Annotated Corpus: The Prague Dependency Treebank”. In: E. Hajičová (ed.): Issues of Valency and Meaning: Studies in Honor of Jarmila Panevová. Prague: Karolinum, 1998, pp. 106–132.
Lapata, M. “Acquiring Lexical Generalizations from Corpora: A Case Study for Diathesis Alternations”. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. College Park, MA, 1999, pp. 397–404.
Lapata, M. and C. Brew. “Using Subcategorization to Resolve Verb Class Ambiguity”. In: Proceedings of Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. MA: College Park, 1999, pp. 266–274.
Lapata, M. and F. Keller. “Corpus Frequency as a Predictor of Verb Bias”. Poster presented at the 4th Conference on Architectures and Mechanisms for Language Processing, Freiburg, 1998.
Lapata, M., S. McDonald and F. Keller. “Determinants of Adjective-Noun Plausibility”. In: Proceedings of the 9th Conference of the European Chapter of the Association for Computational Linguistics. Bergen, 1999, pp. 30–36.
Levin, B. English Verb Classes and Alternations: A Preliminary Investigation. Chicago: University of Chicago Press, 1993.
Marcus, M. P., B. Santorini and M. A. Marcinkiewicz. “Building a Large Annotated Corpus of English: The Penn Treebank”. Computational Linguistics 19(2) (1993), 313–330.
Mikheev, A. “Automatic Rule Induction for Unknown Word Guessing”. Computational Linguistics 23(3) (1997), 405–423.
Miller, G. A., R. Beckwith, C. Fellbaum, D. Gross and K. J. Miller “Introduction to WordNet: An On-line Lexical Database”. International Journal of Lexicography, 3(4) (1990), 235–244.
Sampson, G. English for the Computer: The SUSANNE Corpus and Analytic Scheme. Oxford: Clarendon Press. 1995.
Skut, W., B. Krenn, T. Brants and H. Uszkoreit “An Annotation Scheme for Free Word Order Languages”. In: Proceedings of the 5th Conference on Applied Natural Language Processing. Washington, DC, 1997, pp. 88–95.
Sturt, P., M. J. Pickering and M. W. Crocker. “Exploring the Reanalysis as a Last Resort Strategy”. Paper presented at the 12th CUNY Conference on Human Sentence Processing, New York, 1999a.
Sturt, P., M. J. Pickering and M. W. Crocker. “Structural Change and Reanalysis Difficulty in Language Comprehension”. Journal of Memory and Language 40(1) (1999b), 136–150.
Zamparelli, R. “A Theory of Kinds, Partitives and of/z Possessives”. In: A. Alexiadou and C. Wilder (eds.): Possessors, Predicates and Movement in the Determiner Phrase. Amsterdam: John Benjamins, 1998, pp. 259–301.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Corley, S., Corley, M., Keller, F. et al. Finding Syntactic Structure in Unparsed Corpora The Gsearch Corpus Query System. Computers and the Humanities 35, 81–94 (2001). https://doi.org/10.1023/A:1002497503122
Issue Date:
DOI: https://doi.org/10.1023/A:1002497503122