The Case for Multi-Engine Data Analytics

Tsoumakos, Dimitrios; Mantas, Christos

doi:10.1007/978-3-642-54420-0_40

Dimitrios Tsoumakos²⁷ &
Christos Mantas²⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8374))

Included in the following conference series:

European Conference on Parallel Processing

1800 Accesses
4 Citations

Abstract

As big data analytics have become an important driver for ICT development, a large variety of approaches that apply these advanced technologies on a wide spectrum of applications has been introduced. In this paper we argue on the need of a multi-engine environment that will exploit the largely different models, cost and quality of the existing analytics engines. Such an environment further requires an intelligent management system for orchestrating and coordinating complex analytics tasks over the different available engines. After summarizing some of the current approaches in data analytics, we outline the structure of our envisioned Multi-Engine Management System and present some of the corresponding research directions in its design and development.

Download to read the full chapter text

Chapter PDF

Big Data for Smart Infrastructure Design: Opportunities and Challenges

IoT Analytics Architectures: Challenges, Solution Proposals and Future Research Directions

Multi-engine Analytics with IReS

Keywords

References

The Economist: The data deluge (2010), http://www.economist.com/node/15579717
Ferguson, M.: Architecting a Big Data Platform for Analytics (2012), http://ibmdatamag.com/2012/10/architecting-a-big-data-platform-for-analytics
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. CACM 51(1) (2008)
Google Scholar
The Apache Software Foundation: Apache Hadoop, http://hadoop.apache.org/
Liu, H., Orban, D.: Cloud mapreduce: A mapreduce implementation on top of a cloud operating system. In: 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 464–474 (2011)
Google Scholar
Nokia, R.C.: The Disco Project, http://discoproject.org/
Amazon Web Services: Amazon Elastic MapReduce, http://aws.amazon.com/elasticmapreduce/
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD (2010)
Google Scholar
Seo, S., Yoon, E.J., Kim, J., Jin, S., Kim, J.S., Maeng, S.: Hama: An efficient matrix computation with the mapreduce framework. In: Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science, CLOUDCOM 2010 (2010)
Google Scholar
Melnik, S., Gubarev, A., Long, J.J., Romer, G., Shivakumar, S., Tolton, M., Vassilakis, T.: Dremel: interactive analysis of web-scale datasets. CACM 54(6) (June 2011)
Google Scholar
Hall, A., Bachmann, O., Büssow, R., Gănceanu, S., Nunkesser, M.: Processing a trillion cells per mouse click. Proc. VLDB Endow. 5(11) (July 2012)
Google Scholar
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1099–1110 (2008)
Google Scholar
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., Murthy, R.: Hive-a petabyte scale data warehouse using hadoop. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE), pp. 996–1005. IEEE (2010)
Google Scholar
Pike, R., Dorward, S., Griesemer, R., Quinlan, S.: Interpreting the data: Parallel analysis with sawzall. Sci. Program. 13(4), 277–298 (2005)
Google Scholar
Beyer, K.S., Ercegovac, V., Gemulla, R., Balmin, A., Eltabakh, M., Kanne, C.C., Ozcan, F., Shekita, E.J.: Jaql: A scripting language for large scale semistructured data analysis. In: Proceedings of VLDB Conference (2011)
Google Scholar
Chattopadhyay, B., Lin, L., Liu, W., Mittal, S., Aragonda, P., Lychagina, V., Kwon, Y., Wong, M.: Tenzing a sql implementation on the mapreduce framework. In: Proceedings of VLDB, pp. 1318–1327 (2011)
Google Scholar
Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: A self-tuning system for big data analytics. In: CIDR, pp. 261–272 (2011)
Google Scholar
Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J.M., Welton, C.: Mad skills: new analysis practices for big data. Proc. VLDB Endow. 2(2), 1481–1492 (2009)
Google Scholar
Dittrich, J., Quiané-Ruiz, J.A., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing). Proceedings of the VLDB Endowment 3(1-2), 515–529 (2010)
Google Scholar
Behm, A., Borkar, V.R., Carey, M.J., Grover, R., Li, C., Onose, N., Vernica, R., Deutsch, A., Papakonstantinou, Y., Tsotras, V.J.: Asterix: towards a scalable, semistructured data platform for evolving-world models. Distributed and Parallel Databases 29(3), 185–216 (2011)
Article Google Scholar
Battré, D., Ewen, S., Hueske, F., Kao, O., Markl, V., Warneke, D.: Nephele/PACTs: a programming model and execution framework for web-scale analytical processing. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 119–130 (2010)
Google Scholar
Warneke, D., Kao, O.: Nephele: efficient parallel data processing in the cloud. In: Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers, pp. 8:1–8:10 (2009)
Google Scholar
HP Autonomy: IDLE Server, http://www.autonomy.com/content/Products/products-idol-server/index.en.html
HP Vertica: Vertica Analytics Platform, http://www.vertica.com/the-analytics-platform/
IBM Inc.: IBM Netezza 100, http://www-01.ibm.com/software/data/netezza/100
IBM Corp.: IBM PureData System, http://www-01.ibm.com/software/data/puredata/
IBM Corp.: IBM Announces New Innovations to Help Organizations Benefit from the Next Natural Resource: Big Data, http://www-03.ibm.com/press/us/en/pressrelease/40768.wss
Greenplum, EMC Corp.: Greenplum Database, http://www.greenplum.com/products/greenplum-database
Greenplum, EMC Corp.: Greenplum HD, http://www.ndm.net/emcstore/greenplum/greenplum-hd
Teradata Corp.: Aster Big Analytics Appliance, http://www.asterdata.com/product/big-analytics-appliance.php
Friedman, E., Pawlowski, P., Cieslewicz, J.: Sql/mapreduce: a practical approach to self-describing, polymorphic, and parallelizable user-defined functions. Proc. VLDB Endow. 2(2), 1402–1413 (2009)
Google Scholar
Microsoft Corp.: Microsoft HDInsight, http://www.windowsazure.com/en-us/manage/services/hdinsight/
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. ACM SIGOPS Operating Systems Review 41(3), 59–72 (2007)
Article Google Scholar
Tsoumakos, D., Konstantinou, I., Boumpouka, C., Sioutas, S., Koziris, N.: Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA. In: Proceedings of the the 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Computing Systems Laboratory School of Electrical and Computer Engineering, National Technical University of Athens, Greece
Dimitrios Tsoumakos & Christos Mantas

Authors

Dimitrios Tsoumakos
View author publications
You can also search for this author in PubMed Google Scholar
Christos Mantas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Rechen- und Kommunikationszentrum, RWTH Aachen, Seffenter Weg 23, 52074, Aachen, Germany
Dieter an Mey
TU Vienna, 1040, Vienna, Austria
Michael Alexander
RWTH Aachen University, Seffenter Weg 23, 52074, Aachen, Germany
Paolo Bientinesi & Carsten Clauss &
University Magna Graecia of Catanzaro, 88100, Catanzaro, Italy
Mario Cannataro
Inria Rennes - Bretagne Atlantique, 35042, Rennes, France
Alexandru Costan & Christine Morin &
University of Innsbruck, 6020, Innsbruck, Austria
Gabor Kecskemeti
Department of Computer Science, University of Pisa, 56126, Pisa, Italy
Laura Ricci
Universitat Politècnica de València, 46022, València, Spain
Julio Sahuquillo
LLNL, USA
Martin Schulz
Dipartimento di Informatica, Università di Salerno, 84084, Salerno, Italy
Vittorio Scarano
Tennessee Tech University and Oak Ridge National Laboratory, 38505, Cookeville, TN, USA
Stephen L. Scott
Technische Universität München, 80333, Munich, Germany
Josef Weidendorfer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tsoumakos, D., Mantas, C. (2014). The Case for Multi-Engine Data Analytics. In: an Mey, D., et al. Euro-Par 2013: Parallel Processing Workshops. Euro-Par 2013. Lecture Notes in Computer Science, vol 8374. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54420-0_40

Download citation

DOI: https://doi.org/10.1007/978-3-642-54420-0_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54419-4
Online ISBN: 978-3-642-54420-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Case for Multi-Engine Data Analytics

Abstract

Chapter PDF

Similar content being viewed by others

Big Data for Smart Infrastructure Design: Opportunities and Challenges

IoT Analytics Architectures: Challenges, Solution Proposals and Future Research Directions

Multi-engine Analytics with IReS

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

The Case for Multi-Engine Data Analytics

Abstract

Chapter PDF

Similar content being viewed by others

Big Data for Smart Infrastructure Design: Opportunities and Challenges

IoT Analytics Architectures: Challenges, Solution Proposals and Future Research Directions

Multi-engine Analytics with IReS

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation