Abstract
As big data analytics have become an important driver for ICT development, a large variety of approaches that apply these advanced technologies on a wide spectrum of applications has been introduced. In this paper we argue on the need of a multi-engine environment that will exploit the largely different models, cost and quality of the existing analytics engines. Such an environment further requires an intelligent management system for orchestrating and coordinating complex analytics tasks over the different available engines. After summarizing some of the current approaches in data analytics, we outline the structure of our envisioned Multi-Engine Management System and present some of the corresponding research directions in its design and development.
Chapter PDF
Similar content being viewed by others
Keywords
References
The Economist: The data deluge (2010), http://www.economist.com/node/15579717
Ferguson, M.: Architecting a Big Data Platform for Analytics (2012), http://ibmdatamag.com/2012/10/architecting-a-big-data-platform-for-analytics
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. CACMÂ 51(1) (2008)
The Apache Software Foundation: Apache Hadoop, http://hadoop.apache.org/
Liu, H., Orban, D.: Cloud mapreduce: A mapreduce implementation on top of a cloud operating system. In: 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 464–474 (2011)
Nokia, R.C.: The Disco Project, http://discoproject.org/
Amazon Web Services: Amazon Elastic MapReduce, http://aws.amazon.com/elasticmapreduce/
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD (2010)
Seo, S., Yoon, E.J., Kim, J., Jin, S., Kim, J.S., Maeng, S.: Hama: An efficient matrix computation with the mapreduce framework. In: Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science, CLOUDCOM 2010 (2010)
Melnik, S., Gubarev, A., Long, J.J., Romer, G., Shivakumar, S., Tolton, M., Vassilakis, T.: Dremel: interactive analysis of web-scale datasets. CACMÂ 54(6) (June 2011)
Hall, A., Bachmann, O., Büssow, R., Gănceanu, S., Nunkesser, M.: Processing a trillion cells per mouse click. Proc. VLDB Endow. 5(11) (July 2012)
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1099–1110 (2008)
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., Murthy, R.: Hive-a petabyte scale data warehouse using hadoop. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE), pp. 996–1005. IEEE (2010)
Pike, R., Dorward, S., Griesemer, R., Quinlan, S.: Interpreting the data: Parallel analysis with sawzall. Sci. Program. 13(4), 277–298 (2005)
Beyer, K.S., Ercegovac, V., Gemulla, R., Balmin, A., Eltabakh, M., Kanne, C.C., Ozcan, F., Shekita, E.J.: Jaql: A scripting language for large scale semistructured data analysis. In: Proceedings of VLDB Conference (2011)
Chattopadhyay, B., Lin, L., Liu, W., Mittal, S., Aragonda, P., Lychagina, V., Kwon, Y., Wong, M.: Tenzing a sql implementation on the mapreduce framework. In: Proceedings of VLDB, pp. 1318–1327 (2011)
Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: A self-tuning system for big data analytics. In: CIDR, pp. 261–272 (2011)
Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J.M., Welton, C.: Mad skills: new analysis practices for big data. Proc. VLDB Endow. 2(2), 1481–1492 (2009)
Dittrich, J., Quiané-Ruiz, J.A., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing). Proceedings of the VLDB Endowment 3(1-2), 515–529 (2010)
Behm, A., Borkar, V.R., Carey, M.J., Grover, R., Li, C., Onose, N., Vernica, R., Deutsch, A., Papakonstantinou, Y., Tsotras, V.J.: Asterix: towards a scalable, semistructured data platform for evolving-world models. Distributed and Parallel Databases 29(3), 185–216 (2011)
Battré, D., Ewen, S., Hueske, F., Kao, O., Markl, V., Warneke, D.: Nephele/PACTs: a programming model and execution framework for web-scale analytical processing. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 119–130 (2010)
Warneke, D., Kao, O.: Nephele: efficient parallel data processing in the cloud. In: Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers, pp. 8:1–8:10 (2009)
HP Autonomy: IDLE Server, http://www.autonomy.com/content/Products/products-idol-server/index.en.html
HP Vertica: Vertica Analytics Platform, http://www.vertica.com/the-analytics-platform/
IBM Inc.: IBM Netezza 100, http://www-01.ibm.com/software/data/netezza/100
IBM Corp.: IBM PureData System, http://www-01.ibm.com/software/data/puredata/
IBM Corp.: IBM Announces New Innovations to Help Organizations Benefit from the Next Natural Resource: Big Data, http://www-03.ibm.com/press/us/en/pressrelease/40768.wss
Greenplum, EMC Corp.: Greenplum Database, http://www.greenplum.com/products/greenplum-database
Greenplum, EMC Corp.: Greenplum HD, http://www.ndm.net/emcstore/greenplum/greenplum-hd
Teradata Corp.: Aster Big Analytics Appliance, http://www.asterdata.com/product/big-analytics-appliance.php
Friedman, E., Pawlowski, P., Cieslewicz, J.: Sql/mapreduce: a practical approach to self-describing, polymorphic, and parallelizable user-defined functions. Proc. VLDB Endow. 2(2), 1402–1413 (2009)
Microsoft Corp.: Microsoft HDInsight, http://www.windowsazure.com/en-us/manage/services/hdinsight/
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. ACM SIGOPS Operating Systems Review 41(3), 59–72 (2007)
Tsoumakos, D., Konstantinou, I., Boumpouka, C., Sioutas, S., Koziris, N.: Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA. In: Proceedings of the the 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tsoumakos, D., Mantas, C. (2014). The Case for Multi-Engine Data Analytics. In: an Mey, D., et al. Euro-Par 2013: Parallel Processing Workshops. Euro-Par 2013. Lecture Notes in Computer Science, vol 8374. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54420-0_40
Download citation
DOI: https://doi.org/10.1007/978-3-642-54420-0_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54419-4
Online ISBN: 978-3-642-54420-0
eBook Packages: Computer ScienceComputer Science (R0)