An Analysis of Tree Topological Features in Classifier-Based Unlexicalized Parsing

Chan, Samuel W. K.; Chong, Mickey W. C.; Cheung, Lawrence Y. L.

doi:10.1007/978-3-642-19400-9_13

Samuel W. K. Chan¹⁷,
Mickey W. C. Chong¹⁷ &
Lawrence Y. L. Cheung¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6608))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

2201 Accesses

Abstract

A novel set of “tree topological features” (TTFs) is investigated for improving a classifier-based unlexicalized parser. The features capture the location and shape of subtrees in the treebank. Four main categories of TTFs are proposed and compared. Experimental results showed that each of the four categories independently improved the parsing accuracy significantly over the baseline model. When combined using the ensemble technique, the best unlexicalized parser achieves 84% accuracy without any extra language resources, and matches the performance of early lexicalized parsers. Linguistically, TTFs approximate linguistic notions such as grammatical weight, branching property and structural parallelism. This is illustrated by studying how the features capture structural parallelism in processing coordinate structures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abney, S.: Parsing by Chunks. In: Berwick, R., Abney, S., Tenny, C. (eds.) Principle-Based Parsing. Kluwer Academic, Dordrecht (1991)
Google Scholar
Agirre, E., Baldwin, T., Martinez, D.: Improving Parsing and PP Attachment Performance with Sense Information. In: Proceedings of the 46th Annual Meeting of the Human Language Technology Conference (HLT 2008), pp. 317–325 (2008)
Google Scholar
Black, E., Jelinek, F., Lafferty, J., Magerman, D., Mercer, R., Roukos, S.: Towards History-based Grammars: Using Richer Models for Probabilistic Parsing. In: Proceedings of the 5th DARPA Speech and Natural Language Workshop, pp. 31–37 (1992)
Google Scholar
Chan, S., Cheung, L., Chong, M.: Tree Topological Features for Unlexicalized Parsing. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010): Poster Volume, pp. 117–125 (2010)
Google Scholar
Charniak, E.: A Maximum-Entropy-Inspired Parser. In: Proceedings of NAACL 2000, pp. 132–139 (2000)
Google Scholar
Charniak, E., Johnson, M.: Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 173–180 (2005)
Google Scholar
Chomsky, N., Miller, G.: Introduction to the Formal Analysis of Natural Languages. In: Luce, R., Bush, R., Galanter, E. (eds.) Handbook of Mathematical Psychology, vol. 2, pp. 269–321. Wiley, New York (1963)
Google Scholar
Collins, M.: Head-driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania, Philadelphia (1999)
Google Scholar
Collins, M.: Discriminative Reranking for Natural Language Parsing. In: Machine Learning: Proceedings of the Seventeenth International Conference (ICML 2000), Stanford, California, pp. 175–182 (2000)
Google Scholar
Collins, M.: Head-Driven Statistical Models for Natural Language Parsing. Computational Linguistics 29(4), 589–637 (2003)
Article MathSciNet MATH Google Scholar
Dietterich, T.G.: Ensemble Methods in Machine Learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)
Chapter Google Scholar
Dubey, A., Keller, F., Sturt, P.: A Probabilistic Corpus-based Model of Syntactic Parallelism. Cognition 109(3), 326–344 (2008)
Article Google Scholar
Earley, J.: An Efficient Context-Free Parsing Algorithm. Comm. ACM 6(8), 94–102 (1970)
Article MATH Google Scholar
Frazier, L., Munn, A., Clifton, C.: Processing Coordinate Structures. Journal of Psycholinguistic Research 29(4), 343–370 (2000)
Article Google Scholar
Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)
Article MathSciNet MATH Google Scholar
Gibson, E.: Linguistic Complexity: Locality of Syntactic Dependencies. Cognition 68, 1–76 (1998)
Article Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Heidelberg (2001)
Book MATH Google Scholar
Kasami, T.: An Efficient Recognition and Syntax-analysis Algorithm for Context-free Languages. Scientific Report AFCRL-65-758, Air Force Cambridge Research Lab, MA (1965)
Google Scholar
Kay, M.: Algorithm Schemata and Data Structures in Syntactic Processing. In: Readings in Natural Language Processing, pp. 35–70. Morgans Kaufmann Publishers Inc., San Francisco (1986)
Google Scholar
Klein, D., Manning, C.: Accurate Unlexicalized Parsing. In: Proceedings of the 41st Meeting of the Association for Computational Linguistics, pp. 423–430 (2003)
Google Scholar
Kübler, S., McDonald, R., Nivre, J.: Dependency Parsing. Morgan & Claypool Publishers (2009)
Google Scholar
Magerman, D.: Statistical Decision-tree Models for Parsing. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 276–283 (1995)
Google Scholar
Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a Large Annotated Corpus of English: the Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)
Google Scholar
Matsuzaki, T., Miyao, Y., Tsujii, J.: Probabilistic CFG with Latent Annotations. In: Proceedings of the 43rd Annual Meeting of the ACL, pp. 75–82 (2005)
Google Scholar
Petrov, S., Klein, D.: Learning and Inference for Hierarchically Split PCFGs. In: Proceedings of the 22nd Conference on Artificial Intelligence, Nectar Track, Vancouver (2007)
Google Scholar
Quirk, R., Greenbaum, S., Leech, G., Svartvik, J.: A Grammar of Contemporary English. Longman, London (1972)
Google Scholar
Ramshaw, L.A., Marcus, M.P.: Text Chunking Using Transformation-based Learning. In: Proceedings of the Third Workshop on Very Large Corpora, pp. 82–94 (1995)
Google Scholar
Ratnaparkhi, A.: Learning to Parse Natural Language with Maximum Entropy Models. Machine Learning 34, 151–175 (1999)
Article MATH Google Scholar
Rosenbach, A.: Animacy versus Weight as Determinants of Grammatical Variation in English. Language 81(3), 613–644 (2005)
Article Google Scholar
Sagae, K., Lavie, A.: A Classifier-Based Parser with Linear Run-Time Complexity. In: Proceedings of the Ninth International Workshop on Parsing Technologies (IWPT), pp. 125–132 (2005)
Google Scholar
Sang, E.: Transforming a Chunker to a Parser. In: Veenstra, J., Daelemans, W., Sima‘an, K., Zavrel, J. (eds.) Computational Linguistics in the Netherlands 2000, pp. 177–188 (2001)
Google Scholar
Schapire, R.E., Singer, Y.: BoosTexter: A Boosting-based System for Text Categorization. Machine Learning 39, 135–168 (2000)
Article MATH Google Scholar
Stetina, J., Nagao, M.: Corpus-based PP Attachment Ambiguity Resolution with a Semantic Dictionary. In: Proceedings of the 5th Workshop on Very Large Corpora, Beijing, China, pp. 66–80 (1997)
Google Scholar
Tsuruoka, Y., Tsujii, J.: Chunk Parsing Revisited. In: Proceedings of the 9th International Workshop on Parsing Technologies, pp. 133–140 (2005)
Google Scholar
Wasow, T.: Remarks on Grammatical Weight. Language Variation and Change 9, 81–105 (1997)
Article Google Scholar
Xiong, D., Li, S., Liu, Q., Lin, S., Qian, Y.: Parsing the Penn Chinese Treebank with Semantic Knowledge. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 70–81. Springer, Heidelberg (2005)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Decision Sciences, Chinese University of Hong Kong, Shatin, Hong Kong SAR
Samuel W. K. Chan & Mickey W. C. Chong
Dept. of Linguistics & Modern Languages, Chinese University of Hong Kong, Shatin, Hong Kong SAR
Lawrence Y. L. Cheung

Authors

Samuel W. K. Chan
View author publications
You can also search for this author in PubMed Google Scholar
Mickey W. C. Chong
View author publications
You can also search for this author in PubMed Google Scholar
Lawrence Y. L. Cheung
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Mexico
Alexander F. Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chan, S.W.K., Chong, M.W.C., Cheung, L.Y.L. (2011). An Analysis of Tree Topological Features in Classifier-Based Unlexicalized Parsing. In: Gelbukh, A.F. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19400-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-19400-9_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19399-6
Online ISBN: 978-3-642-19400-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics