Skip to main content

Introducing a New Scalable Data-as-a-Service Cloud Platform for Enriching Traditional Text Mining Techniques by Integrating Ontology Modelling and Natural Language Processing

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8182))

Abstract

A good deal of digital data produced in academia, commerce and industry is made up of a raw, unstructured text, such as Word documents, Excel tables, emails, web pages, etc., which are also often represented in a natural language. An important analytical task in a number of scientific and technological domains is to retrieve information from text data, aiming to get a deeper insight into the content represented by the data in order to obtain some useful, often not explicitly stated knowledge and facts, related to a particular domain of interest. The major challenge is the size, structural complexity, and frequency of the analysed text sets’ updates (i.e., the ‘big data’ aspect), which makes the use of traditional analysis techniques and tools impossible. We introduce an innovative approach to analyse unstructured text data. This allows for improving traditional data mining techniques by adopting algorithms from ontological domain modelling, natural language processing, and machine learning. The technique is inherently designed with parallelism in mind, which allows for high performance on large-scale Cloud computing infrastructures.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. SMILA Framework Website, http://www.eclipse.org/smila/

  2. GATE Projekt Website, http://gate.ac.uk/

  3. Cheptsov, A.: Semantic Web Reasoning on the Internet Scale with Large Knowledge Collider. International Journal of Computer Science and Applications, Technomathematics Research Foundation 8(2), 102–117 (2011)

    Google Scholar 

  4. Pedrinaci, C., Lambert, D., Maleshkova, M., Liu, D., Domingue, J., Krummenacher, R.: Adaptive Service Binding with Lightweight Semantic Web Services. In: Dustdar, S., Li, F. (eds.) Service Engineering. European Research Results. Springer, Heidelberg (2010)

    Google Scholar 

  5. Dean, J., Ghemawat, S.: MapReduce - simplified data processing on large clusters. In: Proc. OSDI 2004: 6th Symposium on Operating Systems Design and Implementation (2004)

    Google Scholar 

  6. Cheptsov, A., Koller, B.: Message-Passing Interface for Java Applications: Practical Aspects of Leveraging High Performance Computing to Speed and Scale Up the Semantic Web. International Journal on Advances in Software 6(1&2), 45–55 (2013)

    Google Scholar 

  7. Gabriel, E., et al.: Open MPI: Goals, concept, and design of a next generation MPI implementation. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 97–104. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  8. Cheptsov, A., Koller, B.: JUNIPER takes aim at Big Data. inSiDE - Innovatives Supercomputing in Deutschland 11(1), 68–69 (2011)

    Google Scholar 

  9. Saggion, H., Funk, A., Maynard, D., Bontcheva, K.: Ontology-based Information Extraction for Business Intelligence. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 843–856. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  10. Yildiz, B.: Ontology-Driven Information Extraction. In: Wien, T.U (Hrsg.) (2007)

    Google Scholar 

  11. Langbein, J.: Concept and implementation of a self-learning, ontology-based retrieval of entities and relations in natural language texts (2012) (in German)

    Google Scholar 

  12. Glimm, B.H.: A Novel Approach to Ontology Classification. Journal of Web Semantics. Science, Services and Agents on the World Wide Web 14(84-101) (2012)

    Google Scholar 

  13. Movshovitz-Attias, D., Cohen, W.W.: Bootstrapping Biomedical Ontologies for Scientific Text using NELL. In: BioNLP 2012 (2012)

    Google Scholar 

  14. Carlson, A., Betteridge, J., Hruschka, E., Mitchell, T.: Coupling Semi-Supervised Learning of Categories and Relations. In: Proceedings of the NAACL HLT Workshop on Semi-Supervised Learning for Natural Language Processing (2009)

    Google Scholar 

  15. Carnegie Mellon University: Read the Web (2012), http://rtw.ml.cmu.edu/rtw/

  16. Deriu, U., Lehmann, J., Schmidt, P.: Creation of a technique ontology based on filtered language technologies. In: Proceedings Knowtech, Bad Homburg (2009) (in German)

    Google Scholar 

  17. Jurafsky, D., Martin, J.: Speech and Language Processing, Upper Saddle River, New Jersey (2009)

    Google Scholar 

  18. Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press (2004)

    Google Scholar 

  19. Goodman, E.L., Mizell, D.: Scalable In-memory RDFS Closure on Billions of Triples. In: Proceedings of the 4th InternationalWorkshop on Scalable SemanticWeb Knowledge Base Systems, Shanghai, China (2010)

    Google Scholar 

  20. InfiniteGraph Website, http://www.objectivity.com/infinitegraph

  21. Data-as-a-Service Wikipedia Entry, http://en.wikipedia.org/wiki/Data_as_a_service

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cheptsov, A., Tenschert, A., Schmidt, P., Glimm, B., Matthesius, M., Liebig, T. (2014). Introducing a New Scalable Data-as-a-Service Cloud Platform for Enriching Traditional Text Mining Techniques by Integrating Ontology Modelling and Natural Language Processing. In: Huang, Z., Liu, C., He, J., Huang, G. (eds) Web Information Systems Engineering – WISE 2013 Workshops. WISE 2013. Lecture Notes in Computer Science, vol 8182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54370-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54370-8_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54369-2

  • Online ISBN: 978-3-642-54370-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics