skip to main content
research-article

SQL-on-hadoop systems: tutorial

Published:01 August 2015Publication History
Skip Abstract Section

Abstract

Enterprises are increasingly using Apache Hadoop, more specifically HDFS, as a central repository for all their data; data coming from various sources, including operational systems, social media and the web, sensors and smart devices, as well as their applications. At the same time many enterprise data management tools (e.g. from SAP ERP and SAS to Tableau) rely on SQL and many enterprise users are familiar and comfortable with SQL. As a result, SQL processing over Hadoop data has gained significant traction over the recent years, and the number of systems that provide such capability has increased significantly. In this tutorial we use the term SQL-on-Hadoop to refer to systems that provide some level of declarative SQL(-like) processing over HDFS and noSQL data sources, using architectures that include computational or storage engines compatible with Apache Hadoop.

References

  1. A. Abouzeid, K. Bajda-Pawlikowski, D. J. Abadi, A. Rasin, and A. Silberschatz. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. PVLDB, 2009. Google ScholarGoogle Scholar
  2. M. Amburst, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, and M. Zaharia. Spark SQL: Relational data processing in Spark. In ACM SIGMOD, 2015. Google ScholarGoogle Scholar
  3. Apache Drill. http://drill.apache.org/.Google ScholarGoogle Scholar
  4. K. Bajda-Pawlikowski, D. J. Abadi, A. Silberschatz, and E. Paulson. Efficient processing of data warehousing queries in a split execution environment. In SIGMOD, 2011. Google ScholarGoogle Scholar
  5. P. Boncz. Vortex: Vectorwise goes Hadoop. http://databasearchitects.blogspot.com/2014/05/vectorwise-goes-hadoop.html.Google ScholarGoogle Scholar
  6. L. Chang, Z. Wang, T. Ma, L. Jian, L. Ma, A. Goldshuv, L. Lonergan, J. Cohen, C. Welton, G. Sherry, and M. Bhandarkar. HAWQ: A massively parallel processing SQL engine in hadoop. In SIGMOD, 2014. Google ScholarGoogle Scholar
  7. S. Gray, F. Özcan, H. Pereyra, B. van der Linden, and A. Zubiri. IBM Big SQL 3.0: SQL-on-Hadoop without compromise. http://public.dhe.ibm.com/common/ssi/ecm/en/sww14019usen/SWW14019USEN.PDF, 2014.Google ScholarGoogle Scholar
  8. Hive on spark. https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark.Google ScholarGoogle Scholar
  9. M. Kornacker and et.al. Impala: A modern, open-source SQL engine for Hadoop. In CIDR, 2015.Google ScholarGoogle Scholar
  10. B. Saha, H. Shah, S. Seth, G. Vijayaraghavan, A. Murthy, and C. Curino. Apache Tez: A unifying framework for modeling and building data processing applications. In SIGMOD, 2015. Google ScholarGoogle Scholar
  11. P. Seshadri, H. Pirahesh, and T. Y. C. Leung. Complex query decorrelation. In ICDE, 1996. Google ScholarGoogle Scholar
  12. Splice machine. http://www.splicemachine.com/.Google ScholarGoogle Scholar
  13. D. Tahara, T. Diamond, and D. J. Abadi. Sinew: A SQL System for Multi-structured Data. In ACM SIGMOD, 2014. Google ScholarGoogle Scholar
  14. A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Anthony, H. Liu, and R. Murthy. Hive - A Petabyte Scale Data Warehouse Using Hadoop. In ICDE, 2010.Google ScholarGoogle Scholar
  15. S. Wanderman-Milne and N. Li. Runtime code generation in Cloudera Impala. IEEE Data Eng. Bull., 2014.Google ScholarGoogle Scholar
  16. R. S. Xin, J. Rosen, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica. Shark: SQL and rich analytics at scale. In ACM SIGMOD, 2013. Google ScholarGoogle Scholar
  17. C. Zuzarte, H. Pirahesh, W. Ma, Q. Cheng, L. Liu, and K. Wong. WinMagic: Subquery elimination using window aggregation. In ACM SIGMOD, 2003. Google ScholarGoogle Scholar

Index Terms

  1. SQL-on-hadoop systems: tutorial
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 8, Issue 12
        Proceedings of the 41st International Conference on Very Large Data Bases, Kohala Coast, Hawaii
        August 2015
        728 pages

        Publisher

        VLDB Endowment

        Publication History

        • Published: 1 August 2015
        Published in pvldb Volume 8, Issue 12

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader