Finding Syntactic Structure in Unparsed Corpora The Gsearch Corpus Query System

Corley, Steffan; Corley, Martin; Keller, Frank; Crocker, Matthew W.; Trewin, Shari

doi:10.1023/A:1002497503122

Finding Syntactic Structure in Unparsed Corpora The Gsearch Corpus Query System

Published: May 2001

Volume 35, pages 81–94, (2001)
Cite this article

Computers and the Humanities Aims and scope Submit manuscript

Steffan Corley¹,
Martin Corley²,
Frank Keller³,
Matthew W. Crocker⁴ &
…
Shari Trewin⁵

117 Accesses
21 Citations
Explore all metrics

Abstract

The Gsearch system allows the selection of sentences by syntacticcriteria from text corpora, even when these corpora contain no priorsyntactic markup. This is achieved by means of a fast chart parser,which takes as input a grammar and a search expression specified by theuser. Gsearch features a modular architecture that can be extendedstraightforwardly to give access to new corpora. The Gsearcharchitecture also allows interfacing with external linguistic resources(such as taggers and lexical databases). Gsearch can be used withgraphical tools for visualizing the results of a query.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dependency Graphs and TEITOK: Exploiting Dependency Parsing

Learning Domain-Specific Grammars from a Small Number of Examples

SdeWaC – A Corpus of Parsable Sentences from the Web

References

Aho, A. V. and J. D. Ullman. The Theory of Parsing, Tanslation and Compiling. Englewood Cliffs, NJ: Prentice-Hall, 1972.
Google Scholar
Berber Sardinha, A. P. “Review: WordSmith Tools”. Computers & Texts 12 (1996), 19–21.
Google Scholar
Brants, T. “TnT: A Statistical Part-of-Speech Tagger. Installation and User's Guide”. Department of Computational Linguistics, Saarland University, 1998.
Burnard, L. “Users Guide for the British National Corpus”. British National Corpus Consortium, Oxford University Computing Service, 1995.
Calder, J. “How to Build a (Quite General) Linguistic Diagram Editor”. In: P. Olivier (ed.): Proceedings of the 2nd Workshop on Thinking with Diagrams. Aberystwyth, 1998a, pp. 71–78.
Calder, J. “Thistle: Diagram Display Engines and Editors”. Technical Report HCRC/TR-97, Human Communication Research Centre, University of Edinburgh, 1998b.
Christ, O. “A Modular and Flexible Architecture for an Integrated Corpus Query System” In: Proceedings of the 3rd Conference on Computational Lexicography and Text Research. Budapest, 1994, pp. 23–32.
Corley, M., S. Corley, M. Crocker, F. Keller and S. Trewin. “Gsearch User Manual”. Human Communication Research Centre, University of Edinburgh, 1999.
Corley, M. and M. Cuthbert. “Indiviual Differences in Modifier Attachments: Experience-Based Factors”. Paper presented at the 3rd Conference on Architectures and Mechanisms for Language Processing, Edinburgh, 1997.
Corley, M. and S. Haywood. “Parsing Modifiers: The Case of Bare-NP Adverbs”. In: Proceedings of the 21st Annual Conference of the Cognitive Science Society. Vancouver, 1999, pp. 126–131.
Earley, J. “An Efficient Context-Free Parsing Algorithm”. Communications of the ACM 13(2) (1970),94–102.
Google Scholar
Francis, W. N., H. Kučera and A. W. Mackie. Frequency Analysis of English Usage: Lexicon and Grammar. Boston: Houghton Mifflin, 1982.
Google Scholar
Greenbaum, S. (ed.) Comparing English Worldwide: The International Corpus of English. Oxford: Clarendon Press, 1996.
Google Scholar
Hajič, J. “Building a Syntactically Annotated Corpus: The Prague Dependency Treebank”. In: E. Hajičová (ed.): Issues of Valency and Meaning: Studies in Honor of Jarmila Panevová. Prague: Karolinum, 1998, pp. 106–132.
Google Scholar
Lapata, M. “Acquiring Lexical Generalizations from Corpora: A Case Study for Diathesis Alternations”. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. College Park, MA, 1999, pp. 397–404.
Lapata, M. and C. Brew. “Using Subcategorization to Resolve Verb Class Ambiguity”. In: Proceedings of Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. MA: College Park, 1999, pp. 266–274.
Lapata, M. and F. Keller. “Corpus Frequency as a Predictor of Verb Bias”. Poster presented at the 4th Conference on Architectures and Mechanisms for Language Processing, Freiburg, 1998.
Lapata, M., S. McDonald and F. Keller. “Determinants of Adjective-Noun Plausibility”. In: Proceedings of the 9th Conference of the European Chapter of the Association for Computational Linguistics. Bergen, 1999, pp. 30–36.
Levin, B. English Verb Classes and Alternations: A Preliminary Investigation. Chicago: University of Chicago Press, 1993.
Google Scholar
Marcus, M. P., B. Santorini and M. A. Marcinkiewicz. “Building a Large Annotated Corpus of English: The Penn Treebank”. Computational Linguistics 19(2) (1993), 313–330.
Google Scholar
Mikheev, A. “Automatic Rule Induction for Unknown Word Guessing”. Computational Linguistics 23(3) (1997), 405–423.
Google Scholar
Miller, G. A., R. Beckwith, C. Fellbaum, D. Gross and K. J. Miller “Introduction to WordNet: An On-line Lexical Database”. International Journal of Lexicography, 3(4) (1990), 235–244.
Google Scholar
Sampson, G. English for the Computer: The SUSANNE Corpus and Analytic Scheme. Oxford: Clarendon Press. 1995.
Google Scholar
Skut, W., B. Krenn, T. Brants and H. Uszkoreit “An Annotation Scheme for Free Word Order Languages”. In: Proceedings of the 5th Conference on Applied Natural Language Processing. Washington, DC, 1997, pp. 88–95.
Sturt, P., M. J. Pickering and M. W. Crocker. “Exploring the Reanalysis as a Last Resort Strategy”. Paper presented at the 12th CUNY Conference on Human Sentence Processing, New York, 1999a.
Sturt, P., M. J. Pickering and M. W. Crocker. “Structural Change and Reanalysis Difficulty in Language Comprehension”. Journal of Memory and Language 40(1) (1999b), 136–150.
Google Scholar
Zamparelli, R. “A Theory of Kinds, Partitives and of/z Possessives”. In: A. Alexiadou and C. Wilder (eds.): Possessors, Predicates and Movement in the Determiner Phrase. Amsterdam: John Benjamins, 1998, pp. 259–301.
Google Scholar

Download references

Author information

Authors and Affiliations

Sharp Laboratories of Europe, Oxford Science Park, Oxford, OX4 4GB, UK
Steffan Corley
Department of Psychology and Human Communication Research Centre, University of Edinburgh, 7 George Square, Edinburgh, EH8 9JZ, UK
Martin Corley
Institute for Communicating and Collaborative Systems, Division of Informatics, University of Edinburgh, 2 Buccleuch Place, Edinburgh, EH8 9LW, UK
Frank Keller
Computational Linguistics, Saarland University, Box 15 11 50, 66041, Saarbrücken, Germany
Matthew W. Crocker
Institute for Communicating and Collaborative Systems, Division of Informatics, University of Edinburgh, 80 South Bridge, Edinburgh, EH1 1HN, UK
Shari Trewin

Authors

Steffan Corley
View author publications
You can also search for this author in PubMed Google Scholar
Martin Corley
View author publications
You can also search for this author in PubMed Google Scholar
Frank Keller
View author publications
You can also search for this author in PubMed Google Scholar
Matthew W. Crocker
View author publications
You can also search for this author in PubMed Google Scholar
Shari Trewin
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Corley, S., Corley, M., Keller, F. et al. Finding Syntactic Structure in Unparsed Corpora The Gsearch Corpus Query System. Computers and the Humanities 35, 81–94 (2001). https://doi.org/10.1023/A:1002497503122

Download citation

Issue Date: May 2001
DOI: https://doi.org/10.1023/A:1002497503122

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finding Syntactic Structure in Unparsed Corpora The Gsearch Corpus Query System

Abstract

Access this article

Similar content being viewed by others

Dependency Graphs and TEITOK: Exploiting Dependency Parsing

Learning Domain-Specific Grammars from a Small Number of Examples

SdeWaC – A Corpus of Parsable Sentences from the Web

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Finding Syntactic Structure in Unparsed Corpora The Gsearch Corpus Query System

Abstract

Access this article

Similar content being viewed by others

Dependency Graphs and TEITOK: Exploiting Dependency Parsing

Learning Domain-Specific Grammars from a Small Number of Examples

SdeWaC – A Corpus of Parsable Sentences from the Web

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation