skip to main content
10.1145/3139958.3140019acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

SparkGIS: Resource Aware Efficient In-Memory Spatial Query Processing

Published:07 November 2017Publication History

ABSTRACT

Much effort has been devoted to support high performance spatial queries on large volumes of spatial data in distributed spatial computing systems, especially in the MapReduce paradigm. Recent works have focused on extending spatial MapReduce frameworks to leverage high performance in-memory distributed processing capabilities of systems such as Spark. However, the performance advantage comes with the requirement of having enough memory and comprehensive configuration. Failing to fulfill this falls back to disk IO, defeating the purpose of such systems or in worst case gets out of memory and fails the job. The problem is aggravated further for spatial processing since the underlying in-memory systems are oblivious of spatial data features and characteristics. In this paper we present SparkGIS - an in-memory oriented spatial data querying system for high throughput and low latency spatial query handling by adapting Apache Spark's distributed processing capabilities. It supports basic spatial queries including containment, spatial join and k-nearest neighbor and allows extending these to complex query pipelines. SparkGIS mitigates skew in distributed processing by supporting several dynamic partitioning algorithms suitable for a rich set of contemporary application scenarios. Multilevel global and local, pre-generated and on-demand in-memory indexes, allow SparkGIS to prune input data and apply compute intensive operations on a subset of relevant spatial objects only. Finally, SparkGIS employs dynamic query rewriting to gracefully manage large spatial query workflows that exceed available distributed resources. Our comparative evaluation has shown that the performance of SparkGIS is on par with contemporary Spark based platforms for relatively smaller queries and outperforms them for larger data and memory intensive workflows by dynamic query rewriting and efficient spatial data management.

References

  1. Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, and Joel Saltz. 2013. Hadoop GIS: A High Performance Spatial Data Warehousing System over Mapreduce. Proc. VLDB Endow. 6, 11 (Aug. 2013), 1009--1020. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ganesh Ananthanarayanan, Srikanth Kandula, Albert G Greenberg, Ion Stoica, Yi Lu, Bikas Saha, and Edward Harris. Reining in the Outliers in Map-Reduce Clusters using Mantri.Google ScholarGoogle Scholar
  3. Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ahmed Eldawy. 2014. SpatialHadoop: Towards Flexible and Scalable Spatial Processing Using Mapreduce. In Proceedings of the 2014 SIGMOD PhD Symposium (SIGMOD'14 PhD Symposium). ACM, New York, NY, USA, 46--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Roger Frye and Mark McKenney. 2015. Big Data Storage Techniques for Spatial Databases: Implications of Big Data Architecture on Spatial Query Processing. In Information Granularity, Big Data, and Computational Intelligence. Springer, 297--323.Google ScholarGoogle Scholar
  6. Paul Jaccard. 1901. Etude comparative de la distribution florale dans une portion des Alpes et du Jura. Impr. Corbaz.Google ScholarGoogle Scholar
  7. Jinxuan Wu Jia Yu, Mohamed Sarwat. 2015. GeoSpark: A Cluster Computing Framework for Processing Large-Scale Spatial Data. In Proceedings of the 2015 International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL 2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. YongChul Kwon, Magdalena Balazinska, Bill Howe, and Jerome Rolia. Skewtune: mitigating skew in mapreduce applications. In Proc. 2012 ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. YongChul Kwon, Magdalena Balazinska, Bill Howe, and Jerome Rolia. 2010. Skew-resistant parallel processing of feature-extracting scientific user-defined functions. In Proc. 1st ACM symposium on Cloud computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Open Street Map. 2017. OSM. (2017). http://www.openstreetmap.orgGoogle ScholarGoogle Scholar
  11. Shoji Nishimura, Sudipto Das, Divyakant Agrawal, and Amr El Abbadi. 2011. MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services. In Proceedings of the 2011 IEEE 12th International Conference on Mobile Data Management - Volume 01 (MDM '11). IEEE Computer Society, Washington, DC, USA, 7--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Apache Spark. 2017. Spark Web. (2017). http://spark.apache.orgGoogle ScholarGoogle Scholar
  13. Mingjie Tang, Yongyang Yu, Qutaibah M Malluhi, Mourad Ouzzani, and Walid G Aref. 2016. Locationspark: a distributed in-memory data management system for big spatial data. Proceedings of the VLDB Endowment 9, 13 (2016), 1565--1568. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dong Xie, Feifei Li, Bin Yao, Gefei Li, Liang Zhou, and Minyi Guo. 2016. Simba: Efficient In-Memory Spatial Analytics. In (To Appear) In Proceedings of 35th ACM SIGMOD International Conference on Management of Data (SIGMOD'16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Simin You and Jianting Zhang. 2015. Large-Scale Spatial Join Query Processing in Cloud. Technical Report. City University of New York.Google ScholarGoogle Scholar
  16. Simin You, Jianting Zhang, and L Gruenwald. 2015. Large-scale spatial join query processing in cloud. In IEEE CloudDM workshop (To Appear) http://www-cs.ccny.cuny.edu/~jzhang/papers/spatial_cc_tr.pdf.Google ScholarGoogle ScholarCross RefCross Ref
  17. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI'12). USENIX Association, Berkeley, CA, USA, 2--2. http://dl.acm.org/citation.cfm?id=2228298.2228301 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud'10). USENIX Association, Berkeley, CA, USA, 10--10. http://dl.acm.org/citation.cfm?id=1863103.1863113 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SparkGIS: Resource Aware Efficient In-Memory Spatial Query Processing

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGSPATIAL '17: Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
            November 2017
            677 pages
            ISBN:9781450354905
            DOI:10.1145/3139958

            Copyright © 2017 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 7 November 2017

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            SIGSPATIAL '17 Paper Acceptance Rate39of193submissions,20%Overall Acceptance Rate220of1,116submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader