Skip to main content
Log in

Finding Syntactic Structure in Unparsed Corpora The Gsearch Corpus Query System

  • Published:
Computers and the Humanities Aims and scope Submit manuscript

Abstract

The Gsearch system allows the selection of sentences by syntacticcriteria from text corpora, even when these corpora contain no priorsyntactic markup. This is achieved by means of a fast chart parser,which takes as input a grammar and a search expression specified by theuser. Gsearch features a modular architecture that can be extendedstraightforwardly to give access to new corpora. The Gsearcharchitecture also allows interfacing with external linguistic resources(such as taggers and lexical databases). Gsearch can be used withgraphical tools for visualizing the results of a query.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Aho, A. V. and J. D. Ullman. The Theory of Parsing, Tanslation and Compiling. Englewood Cliffs, NJ: Prentice-Hall, 1972.

    Google Scholar 

  • Berber Sardinha, A. P. “Review: WordSmith Tools”. Computers & Texts 12 (1996), 19–21.

    Google Scholar 

  • Brants, T. “TnT: A Statistical Part-of-Speech Tagger. Installation and User's Guide”. Department of Computational Linguistics, Saarland University, 1998.

  • Burnard, L. “Users Guide for the British National Corpus”. British National Corpus Consortium, Oxford University Computing Service, 1995.

  • Calder, J. “How to Build a (Quite General) Linguistic Diagram Editor”. In: P. Olivier (ed.): Proceedings of the 2nd Workshop on Thinking with Diagrams. Aberystwyth, 1998a, pp. 71–78.

  • Calder, J. “Thistle: Diagram Display Engines and Editors”. Technical Report HCRC/TR-97, Human Communication Research Centre, University of Edinburgh, 1998b.

  • Christ, O. “A Modular and Flexible Architecture for an Integrated Corpus Query System” In: Proceedings of the 3rd Conference on Computational Lexicography and Text Research. Budapest, 1994, pp. 23–32.

  • Corley, M., S. Corley, M. Crocker, F. Keller and S. Trewin. “Gsearch User Manual”. Human Communication Research Centre, University of Edinburgh, 1999.

  • Corley, M. and M. Cuthbert. “Indiviual Differences in Modifier Attachments: Experience-Based Factors”. Paper presented at the 3rd Conference on Architectures and Mechanisms for Language Processing, Edinburgh, 1997.

  • Corley, M. and S. Haywood. “Parsing Modifiers: The Case of Bare-NP Adverbs”. In: Proceedings of the 21st Annual Conference of the Cognitive Science Society. Vancouver, 1999, pp. 126–131.

  • Earley, J. “An Efficient Context-Free Parsing Algorithm”. Communications of the ACM 13(2) (1970),94–102.

    Google Scholar 

  • Francis, W. N., H. Kučera and A. W. Mackie. Frequency Analysis of English Usage: Lexicon and Grammar. Boston: Houghton Mifflin, 1982.

    Google Scholar 

  • Greenbaum, S. (ed.) Comparing English Worldwide: The International Corpus of English. Oxford: Clarendon Press, 1996.

    Google Scholar 

  • Hajič, J. “Building a Syntactically Annotated Corpus: The Prague Dependency Treebank”. In: E. Hajičová (ed.): Issues of Valency and Meaning: Studies in Honor of Jarmila Panevová. Prague: Karolinum, 1998, pp. 106–132.

    Google Scholar 

  • Lapata, M. “Acquiring Lexical Generalizations from Corpora: A Case Study for Diathesis Alternations”. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. College Park, MA, 1999, pp. 397–404.

  • Lapata, M. and C. Brew. “Using Subcategorization to Resolve Verb Class Ambiguity”. In: Proceedings of Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. MA: College Park, 1999, pp. 266–274.

  • Lapata, M. and F. Keller. “Corpus Frequency as a Predictor of Verb Bias”. Poster presented at the 4th Conference on Architectures and Mechanisms for Language Processing, Freiburg, 1998.

  • Lapata, M., S. McDonald and F. Keller. “Determinants of Adjective-Noun Plausibility”. In: Proceedings of the 9th Conference of the European Chapter of the Association for Computational Linguistics. Bergen, 1999, pp. 30–36.

  • Levin, B. English Verb Classes and Alternations: A Preliminary Investigation. Chicago: University of Chicago Press, 1993.

    Google Scholar 

  • Marcus, M. P., B. Santorini and M. A. Marcinkiewicz. “Building a Large Annotated Corpus of English: The Penn Treebank”. Computational Linguistics 19(2) (1993), 313–330.

    Google Scholar 

  • Mikheev, A. “Automatic Rule Induction for Unknown Word Guessing”. Computational Linguistics 23(3) (1997), 405–423.

    Google Scholar 

  • Miller, G. A., R. Beckwith, C. Fellbaum, D. Gross and K. J. Miller “Introduction to WordNet: An On-line Lexical Database”. International Journal of Lexicography, 3(4) (1990), 235–244.

    Google Scholar 

  • Sampson, G. English for the Computer: The SUSANNE Corpus and Analytic Scheme. Oxford: Clarendon Press. 1995.

    Google Scholar 

  • Skut, W., B. Krenn, T. Brants and H. Uszkoreit “An Annotation Scheme for Free Word Order Languages”. In: Proceedings of the 5th Conference on Applied Natural Language Processing. Washington, DC, 1997, pp. 88–95.

  • Sturt, P., M. J. Pickering and M. W. Crocker. “Exploring the Reanalysis as a Last Resort Strategy”. Paper presented at the 12th CUNY Conference on Human Sentence Processing, New York, 1999a.

  • Sturt, P., M. J. Pickering and M. W. Crocker. “Structural Change and Reanalysis Difficulty in Language Comprehension”. Journal of Memory and Language 40(1) (1999b), 136–150.

    Google Scholar 

  • Zamparelli, R. “A Theory of Kinds, Partitives and of/z Possessives”. In: A. Alexiadou and C. Wilder (eds.): Possessors, Predicates and Movement in the Determiner Phrase. Amsterdam: John Benjamins, 1998, pp. 259–301.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Corley, S., Corley, M., Keller, F. et al. Finding Syntactic Structure in Unparsed Corpora The Gsearch Corpus Query System. Computers and the Humanities 35, 81–94 (2001). https://doi.org/10.1023/A:1002497503122

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1002497503122

Navigation