Abstract
Many annotation tasks in computational linguistics are tackled with manually constructed pipelines of algorithms. In real-time tasks where information needs are stated and addressed ad-hoc, however, manual construction is infeasible. This paper presents an artificial intelligence approach to automatically construct annotation pipelines for given information needs and quality prioritizations. Based on an abstract ontological model, we use partial order planning to select a pipeline’s algorithms and informed search to obtain an efficient pipeline schedule. We realized the approach as an expert system on top of Apache UIMA, which offers evidence that pipelines can be constructed ad-hoc in near-zero time.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agichtein, E.: Scaling Information Extraction to Large Document Collections. Bulletin of the IEEE Computer Society TCDE 28, 3–10 (2005)
Apache UIMA, http://uima.apache.org
Bangalore, S.: Thinking Outside the Box for Natural Language Processing. In: Gelbukh, A. (ed.) CICLing 2012, Part I. LNCS, vol. 7181, pp. 1–16. Springer, Heidelberg (2012)
Chiticariu, L., Krishnamurthy, R., Li, Y., Raghavan, S., Reiss, F.R., Vaithyanathan, S.: SystemT: An Algebraic Approach to Declarative Information Extraction. In: Proc. of the 48th ACL, pp. 128–137 (2010)
Das Sarma, A., Jain, A., Bohannon, P.: Building a Generic Debugger for Information Extraction Pipelines. In: Proc. of the 20th CIKM, pp. 2229–2232 (2011)
Dezsényi, C., Dobrowiecki, T.P., Mészáros, T.: Adaptive Document Analysis with Planning. In: Pěchouček, M., Petta, P., Varga, L.Z. (eds.) CEEMAS 2005. LNCS (LNAI), vol. 3690, pp. 620–623. Springer, Heidelberg (2005)
Etzioni, O.: Search Needs a Shake-up. Nature 476, 25–26 (2011)
Fader, A., Soderland, S., Etzioni, O.: Identifying Relations for Open Information Extraction. In: Proc. of the EMNLP, pp. 1535–1545 (2011)
Fox, M.S., Smith, S.F.: ISIS: A Knowledge-based System for Factory Scheduling. Expert Systems 1, 25–49 (1984)
GATE, http://gate.ac.uk
Kano, Y.: Kachako: Towards a Data-centric Platform for Full Automation of Service Selection, Composition, Scalable Deployment and Evaluation. In: Proc. of the 19th IEEE ICWS, pp. 642–643 (2012)
Kano, Y., Dorado, R., McCrohon, L., Ananiadou, S., Tsujii, J.: U-Compare: An Integrated Language Resource Evaluation Platform Including a Comprehensive UIMA Resource Library. In: Proc. of the Seventh LREC, pp. 428–434 (2010)
Kim, J.D., Wang, Y., Takagi, T., Yonezawa, A.: Overview of Genia Event Task in BioNLP Shared Task 2011. In: BioNLP Shared Task Workshop, pp. 7–15 (2011)
Marler, R.T., Arora, J.S.: Survey of Multi-Objective Optimization Methods for Engineering. Structural and Multidisciplinary Optimization 26(6), 369–395 (2004)
Pasca, M.: Web-based Open-Domain Information Extraction. In: Proc. of the 20th CIKM, pp. 2605–2606 (2011)
Pauls, A., Klein, D.: k-best A* Parsing. In: Proc. of the Joint Conference of the 47th ACL and the 4th IJCNLP, pp. 958–966 (2009)
Riabov, A., Liu, Z.: Scalable Planning for Distributed Stream Processing Systems. In: Proc. of the 16th ICAPS, pp. 31–41 (2006)
Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Prentice-Hall (2009)
Wachsmuth, H., Prettenhofer, P., Stein, B.: Efficient Statement Identification for Automatic Market Forecasting. In: Proc. of the 23rd COLING, pp. 1128–1136 (2010)
Wachsmuth, H., Stein, B.: Optimal Scheduling of Information Extraction Algorithms. In: Proc. of the 24th COLING: Posters, pp. 1281–1290 (2012)
Wachsmuth, H., Stein, B., Engels, G.: Constructing Efficient Information Extraction Pipelines. In: Proc. of the 20th CIKM, pp. 2237–2240 (2011)
Žáková, M., Křemen, P., Železný, F., Lavrač, N.: Automating Knowledge Discovery Workflow Composition through Ontology-based Planning. IEEE Transactions on Automation Science and Engineering 8(2), 253–264 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wachsmuth, H., Rose, M., Engels, G. (2013). Automatic Pipeline Construction for Real-Time Annotation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37247-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-37247-6_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37246-9
Online ISBN: 978-3-642-37247-6
eBook Packages: Computer ScienceComputer Science (R0)