Abstract
Prepositional phrase attachments are known to be an important source of errors in parsing natural language. In some cases, pure syntactic features cannot be used for prepositional phrase attachment disambiguation while visual features could help. In this work, we are interested in the impact of the integration of such features in a parsing system. We propose a correction strategy pipeline for prepositional attachments using visual information, trained on a multimodal corpus of images and captions. The evaluation of the system shows us that using visual features allows, in certain cases, to correct the errors of a parser. It also helps to identify the most difficult aspects of such integration.
The work of Leonor Becerra-Bonache has been performed during her teaching leave granted by the CNRS (French National Center for Scientific Research) in Laboratoire d’Informatique et Systèmes of Aix-Marseille University.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agirre, E., Baldwin, T., Martinez, D.: Improving parsing and PP attachment performance with sense information. In: ACL HLT, pp. 317–325 (2008)
Belinkov, Y., Lei, T., Barzilay, R., Globerson, A.: Exploring compositional architectures and word vector representations for prepositional phrase attachment. TACL 2, 561–572 (2014)
Chang, A.X., Monroe, W., Savva, M., Potts, C., Manning, C.D.: Text to 3D scene generation with rich lexical grounding. In: ACL-IJCNLP:2015, pp. 53–62 (2015)
Christie, G., Laddha, A., Agrawal, A., et al.: Resolving language and vision ambiguities together: joint segmentation & prepositional attachment resolution in captioned scenes. In: EMNLP, pp. 1493–1503 (2016)
Coco, M.I., Keller, F.: The interaction of visual and linguistic saliency during syntactic ambiguity resolution. QJEP 68(1), 46–74 (2015)
Coyne, R., Sproat, R.: WordsEye: an automatic text-to-scene conversion system. In: SIGGRAPH, pp. 487–496 (2001)
Dasigi, P., Ammar, W., Dyer, C., Hovy, E.: Ontology-aware token embeddings for prepositional phrase attachment. In: ACL, vol. 1, pp. 2089–2098 (2017)
Delecraz, S., Nasr, A., Bechet, F., Favre, B.: Correcting prepositional phrase attachments using multimodal corpora. In: IWPT, pp. 72–77 (2017)
Delecraz, S., Nasr, A., Béchet, F., Favre, B.: Adding syntactic annotations to flickr30k entities corpus for multimodal ambiguous prepositional-phrase attachment resolution. In: LREC (2018)
Faghri, F., Fleet, D.J., Kiros, J.R., Fidler, S.: Vse++: Improved visual-semantic embeddings. arXiv preprint arXiv:1707.05612 (2017)
Fang, H., Gupta, S., Iandola, F.N., et al.: From captions to visual concepts and back. In: CVPR, pp. 1473–1482 (2015)
Freund, Y., Schapire, R., Abe, N.: A short introduction to boosting. JSAI 14(771–780), 1612 (1999)
Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: CVPR, pp. 3128–3137 (2015)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
de Kok, D., Hinrichs, E.W.: Transition-based dependency parsing with topological fields. In: ACL, vol. 2: short paper (2016)
de Kok, D., Ma, J., Dima, C., Hinrichs, E.: PP attachment: Where do we stand? In: EACL, vol. 2, pp. 311–317 (2017)
Kummerfeld, J.K., Hall, D.L.W., Curran, J.R., Klein, D.: Parser showdown at the wall street corral: an empirical investigation of error types in parser output. In: EMNLP-CoNLL, pp. 1048–1059 (2012)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993)
Mirroshandel, S.A., Nasr, A.: Integrating selectional constraints and subcategorization frames in a dependency parser. Comput. Linguist. 42, 55–90 (2016)
Nasr, A., Béchet, F., Rey, J.F., Favre, B., Le Roux, J.: MACAON: an NLP tool suite for processing word lattices. In: ACL HLT, pp. 86–91 (2011)
Peyre, J., Laptev, I., Schmid, C., Sivic, J.: Weakly-supervised learning of visual relations. In: ICCV, pp. 5189–5198 (2017)
Plummer, B.A., Wang, L., Cervantes, C.M., Caicedo, J.C., Hockenmaier, J., Lazebnik, S.: Flickr30k entities: collecting region-to-phrase correspondences for richer image-to-sentence models. ICCV 123(1), 74–93 (2017)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: CVPR, pp. 6517–6525. IEEE (2017)
Russakovsky, O., Deng, J., Su, H., et al.: ImageNet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)
Shaerlaekens, A.: The Two-Word Sentence in Child Language Development: A Study Based on Evidence Provided by Dutch-Speaking Triplets. Mouton, The Hague (1973)
Snow, C.E.: Mothers’ speech to children learning language. Child Dev. 43(2), 549–565 (1972)
Spivey, M.J., Tanenhaus, M.K., Eberhard, K.M., Sedivy, J.C.: Eye movements and spoken language comprehension: effects of visual context on syntactic ambiguity resolution. Cogn. Psychol. 45(4), 447–481 (2002)
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: CVPR, pp. 3156–3164 (2015)
Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. TACL 2, 67–78 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Delecraz, S., Becerra-Bonache, L., Nasr, A., Bechet, F., Favre, B. (2019). Visual Disambiguation of Prepositional Phrase Attachments: Multimodal Machine Learning for Syntactic Analysis Correction. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2019. Lecture Notes in Computer Science(), vol 11506. Springer, Cham. https://doi.org/10.1007/978-3-030-20521-8_52
Download citation
DOI: https://doi.org/10.1007/978-3-030-20521-8_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20520-1
Online ISBN: 978-3-030-20521-8
eBook Packages: Computer ScienceComputer Science (R0)