Skip to main content

Optimization Factor Analysis of Large-Scale Join Queries on Different Platforms

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10179))

Included in the following conference series:

Abstract

Popular big data computing platforms, such as Spark, provide new computing paradigm for traditional database operations, such as queries. Except for the management ability of large-scale data, big data platforms earn the reputation for their simple programming interface and good performance of scaling out. But traditional databases have intrinsic optimization mechanisms for fundamental operators, which supports efficient and flexible data processing. It is very valuable to give a comprehensive view of these two kinds of platforms on data processing performance. In this paper, we focus on join operation, a primary and frequently used operator for both databases and big data analysis, design and conduct extensive experiments to test the performance of the two classic platforms under unified datasets and hardware, which will disclose the performance influence on computing schema, storage media, etc. Based on the experimental analysis, we also put forwards our advice on computing platform onsideration for different application scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jiang, D., Tung, A.K.H., Gang, C.: Map-Join-Reduce: toward scalable and efficient data analysis on large clusters. IEEE Trans. Knowl. Data Eng. 23(9), 1299–1311 (2011)

    Article  Google Scholar 

  2. Zhou, M., Zhang, R., Zeng, D., et al.: Join optimization in the MapReduce environment for column-wise data store. In: Proceedings of 6th International Conference on Semantics Knowledge, Girds (SKG 2010), Los Alamitos, CA, 2011 Observation of Strains, pp. 97–104. IEEE Computer Society (2010)

    Google Scholar 

  3. Afrati, F.N., Ullman, J.D.: Optimizing multiway joins in a MapReduce environment. IEEE Trans. Knowl. Data Eng. 23(9), 1282–1298 (2011)

    Article  Google Scholar 

  4. Zhao, Y.-R., Wang, W.-P.: Efficient join query processing algorithm CHMJ based on hadoop. J. Softw. 23(8), 2032–2041 (2012)

    Article  Google Scholar 

  5. DB-Engines Ranking. http://db-engines.com/en/ranking/relational+dbms

  6. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: HotCloud 2010, June 2010

    Google Scholar 

  7. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI 2012, April 2012

    Google Scholar 

  8. MySQL. http://www.mysql.com

  9. TPC-H. http://www.tpc.org/tpch/

  10. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. CACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  11. Armbrust, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J., Ghodsi, A., Zaharia, M.: Spark SQL: relational data processing in spark. In: SIGMOD Conference, pp. 1383–1394 (2015)

    Google Scholar 

  12. Meng, X., Bradley, J.K., Yavuz, B., Sparks, E.R., Venkataraman, S., Liu, D., Jeremy Freeman, D.B., Tsai, M.A., Owen, S., Xin, D., Xin, R., Franklin, M.J., Zadeh, R., Zaharia, M., Talwalkar, A.: MLlib: machine learning in apache spark (2015). CoRR arXiv:1505.06807

  13. Blanas, S., Patel, J.M., Ercegovac, V., et al.: A comparison of join algorithms for log processing in MapReduce. In: Proceedings of 2010 ACM SIGMOD International Conference on Management of data, pp. 975–986. ACM (2010)

    Google Scholar 

  14. Yang, H.C., Dasdan, A., Hsiao, R.L., et al.: Map-Reduce-Merge: simplified relational data processing on large clusters. In: ACM SIGMOD International Conference on Management of Data, pp. 1029–1040. ACM (2007)

    Google Scholar 

  15. Yang, H., Parker, D.S.: Traverse: simplified indexing on large Map-Reduce-Merge clusters. In: Zhou, X., Yokota, H., Deng, K., Liu, Q. (eds.) DASFAA 2009. LNCS, vol. 5463, pp. 308–322. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00887-0_27

    Chapter  Google Scholar 

  16. Dittrich, J., Quian Ruiz, J.A., et al.: Hadoop++ making a yellow elephant run like a cheetah (without it even noticing). Proc. VLDB Endow. 3(12), 518–529 (2010)

    Google Scholar 

  17. Agrawal, D., et al.: SparkBench – a spark performance testing suite. In: Nambiar, R., Poess, M. (eds.) TPCTC 2015. LNCS, vol. 9508, pp. 26–44. Springer, Heidelberg (2016). doi:10.1007/978-3-319-31409-9_3

    Chapter  Google Scholar 

Download references

Acknowledgment

This study is supported by the National Natural Science Foundation of China (Nos. 61363005, 61462017, U1501252, 61662013), Guangxi Natural Science Foundation of China (Nos. 2014GXNSFAA118353, 2014GXNSFAA118390, 2014GXNSFDA118036), Guangxi Key Laboratory of Automatic Detection Technology and Instrument Foundation (YQ15110), Guangxi Cooperative Innovation Center of Cloud Computing and Big Data, and the High Level Innovation Team of Colleges and Universities in Guangxi and Outstanding Scholars Program Funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jingwei Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Yang, C., Wang, Q., Yang, Q., Zhang, H., Zhang, J., Zhou, Y. (2017). Optimization Factor Analysis of Large-Scale Join Queries on Different Platforms. In: Bao, Z., Trajcevski, G., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10179. Springer, Cham. https://doi.org/10.1007/978-3-319-55705-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55705-2_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55704-5

  • Online ISBN: 978-3-319-55705-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics