Introducing a New Scalable Data-as-a-Service Cloud Platform for Enriching Traditional Text Mining Techniques by Integrating Ontology Modelling and Natural Language Processing

Cheptsov, Alexey; Tenschert, Axel; Schmidt, Paul; Glimm, Birte; Matthesius, Mauricio; Liebig, Thorsten

doi:10.1007/978-3-642-54370-8_6

Introducing a New Scalable Data-as-a-Service Cloud Platform for Enriching Traditional Text Mining Techniques by Integrating Ontology Modelling and Natural Language Processing

Alexey Cheptsov²⁰,
Axel Tenschert²⁰,
Paul Schmidt²¹,
Birte Glimm²²,
Mauricio Matthesius²³ &
…
Thorsten Liebig²⁴

Conference paper

1522 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8182))

Abstract

A good deal of digital data produced in academia, commerce and industry is made up of a raw, unstructured text, such as Word documents, Excel tables, emails, web pages, etc., which are also often represented in a natural language. An important analytical task in a number of scientific and technological domains is to retrieve information from text data, aiming to get a deeper insight into the content represented by the data in order to obtain some useful, often not explicitly stated knowledge and facts, related to a particular domain of interest. The major challenge is the size, structural complexity, and frequency of the analysed text sets’ updates (i.e., the ‘big data’ aspect), which makes the use of traditional analysis techniques and tools impossible. We introduce an innovative approach to analyse unstructured text data. This allows for improving traditional data mining techniques by adopting algorithms from ontological domain modelling, natural language processing, and machine learning. The technique is inherently designed with parallelism in mind, which allows for high performance on large-scale Cloud computing infrastructures.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

SMILA Framework Website, http://www.eclipse.org/smila/
GATE Projekt Website, http://gate.ac.uk/
Cheptsov, A.: Semantic Web Reasoning on the Internet Scale with Large Knowledge Collider. International Journal of Computer Science and Applications, Technomathematics Research Foundation 8(2), 102–117 (2011)
Google Scholar
Pedrinaci, C., Lambert, D., Maleshkova, M., Liu, D., Domingue, J., Krummenacher, R.: Adaptive Service Binding with Lightweight Semantic Web Services. In: Dustdar, S., Li, F. (eds.) Service Engineering. European Research Results. Springer, Heidelberg (2010)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce - simplified data processing on large clusters. In: Proc. OSDI 2004: 6th Symposium on Operating Systems Design and Implementation (2004)
Google Scholar
Cheptsov, A., Koller, B.: Message-Passing Interface for Java Applications: Practical Aspects of Leveraging High Performance Computing to Speed and Scale Up the Semantic Web. International Journal on Advances in Software 6(1&2), 45–55 (2013)
Google Scholar
Gabriel, E., et al.: Open MPI: Goals, concept, and design of a next generation MPI implementation. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 97–104. Springer, Heidelberg (2004)
Chapter Google Scholar
Cheptsov, A., Koller, B.: JUNIPER takes aim at Big Data. inSiDE - Innovatives Supercomputing in Deutschland 11(1), 68–69 (2011)
Google Scholar
Saggion, H., Funk, A., Maynard, D., Bontcheva, K.: Ontology-based Information Extraction for Business Intelligence. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 843–856. Springer, Heidelberg (2007)
Chapter Google Scholar
Yildiz, B.: Ontology-Driven Information Extraction. In: Wien, T.U (Hrsg.) (2007)
Google Scholar
Langbein, J.: Concept and implementation of a self-learning, ontology-based retrieval of entities and relations in natural language texts (2012) (in German)
Google Scholar
Glimm, B.H.: A Novel Approach to Ontology Classification. Journal of Web Semantics. Science, Services and Agents on the World Wide Web 14(84-101) (2012)
Google Scholar
Movshovitz-Attias, D., Cohen, W.W.: Bootstrapping Biomedical Ontologies for Scientific Text using NELL. In: BioNLP 2012 (2012)
Google Scholar
Carlson, A., Betteridge, J., Hruschka, E., Mitchell, T.: Coupling Semi-Supervised Learning of Categories and Relations. In: Proceedings of the NAACL HLT Workshop on Semi-Supervised Learning for Natural Language Processing (2009)
Google Scholar
Carnegie Mellon University: Read the Web (2012), http://rtw.ml.cmu.edu/rtw/
Deriu, U., Lehmann, J., Schmidt, P.: Creation of a technique ontology based on filtered language technologies. In: Proceedings Knowtech, Bad Homburg (2009) (in German)
Google Scholar
Jurafsky, D., Martin, J.: Speech and Language Processing, Upper Saddle River, New Jersey (2009)
Google Scholar
Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press (2004)
Google Scholar
Goodman, E.L., Mizell, D.: Scalable In-memory RDFS Closure on Billions of Triples. In: Proceedings of the 4th InternationalWorkshop on Scalable SemanticWeb Knowledge Base Systems, Shanghai, China (2010)
Google Scholar
InfiniteGraph Website, http://www.objectivity.com/infinitegraph
Data-as-a-Service Wikipedia Entry, http://en.wikipedia.org/wiki/Data_as_a_service

Download references

Author information

Authors and Affiliations

High-Performance Computing Center Stuttgart, Nobelstr. 19, 70569, Stuttgart, Germany
Alexey Cheptsov & Axel Tenschert
Institute of the Society for the Promotion of Applied Information Sciences, Saarland University, Martin-Luther-Str. 14, 66111, Saarbrücken, Germany
Paul Schmidt
Institute of Artificial Intelligence, University of Ulm, 89069, Ulm, Germany
Birte Glimm
Objectivity, Inc., 3099 North First Street, Suite 200, San Jose, CA, 95134, USA
Mauricio Matthesius
derivo GmbH, James-Franck-Ring, 89081, Ulm, Germany
Thorsten Liebig

Authors

Alexey Cheptsov
View author publications
You can also search for this author in PubMed Google Scholar
Axel Tenschert
View author publications
You can also search for this author in PubMed Google Scholar
Paul Schmidt
View author publications
You can also search for this author in PubMed Google Scholar
Birte Glimm
View author publications
You can also search for this author in PubMed Google Scholar
Mauricio Matthesius
View author publications
You can also search for this author in PubMed Google Scholar
Thorsten Liebig
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Vrije University of Amsterdam, De Boelelaan 1081, 1081, Amsterdam, The Netherlands
Zhisheng Huang
Faculty of Information and Communication Technologies, Swinburne University of Technology, PO Box 218, 3122, Melbourne, VIC, Australia
Chengfei Liu
College of Engineering and Science, Victoria University, PO Box 14428, 8001, Melbourne, VIC, Australia
Jing He
Centre for Applied Informatics, Victoria University, PO Box 14428, 8001, Melbourne, VIC, Australia
Guangyan Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cheptsov, A., Tenschert, A., Schmidt, P., Glimm, B., Matthesius, M., Liebig, T. (2014). Introducing a New Scalable Data-as-a-Service Cloud Platform for Enriching Traditional Text Mining Techniques by Integrating Ontology Modelling and Natural Language Processing. In: Huang, Z., Liu, C., He, J., Huang, G. (eds) Web Information Systems Engineering – WISE 2013 Workshops. WISE 2013. Lecture Notes in Computer Science, vol 8182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54370-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-54370-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54369-2
Online ISBN: 978-3-642-54370-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics