ABSTRACT
Query-driven analytics on scientific datasets is one of fundamental approaches for scientific discoveries. Existing studies have explored query-driven analytics on uniform resolution meshes. However, querying on adaptive mesh refinement (AMR) data has not been explored yet. As many simulations have been lately transitioning to AMR, new methods for efficient query-driven analysis on AMR data are needed.
In this paper, we present the first work to support scalable AMR-aware analysis. We propose an AMR-aware hybrid index for supporting two common forms (i.e., spatial and value-based query selections) in query-driven analytics. To sustainably support future-scale analysis, we design an in situ (run-time) index building strategy with minimized performance impact to the co-located simulation. Additionally, we develop a parallel post-processing query method with an adaptive workload-balanced strategy. Our evaluation demonstrates the scalability of our in situ indexing and scalable querying methods up to 16,384 and 1,024 cores, respectively, using a Chombo-based benchmark. Compared to non-AMR-aware indexing and querying, we demonstrate up to 12.4x and 500x performance improvement, respectively.
- Adams, M., Colella, P., Graves, D. T., Johnson, J., Keen, N., Ligocki, T. J., Martin, D. F., McCorquodale, P., Modiano, D., Schwartz, P., Sternberg, T., and Straalen, B. V. Chombo software package for AMR applications-design document. Lawrence Berkeley National Laboratory Technical Report LBNL-6616E (2000).Google Scholar
- Beckmann, N., Kriegel, H., Schneider, R., and Seeger, B. The R*-tree: An efficient and robust access method for points and rectangles. 1990.Google Scholar
- Berger, M. J., and Oliger, J. Adaptive mesh refinement for hyperbolic partial differential equations. Journal of Computational Physics (1984).Google Scholar
- Boyuka II, D. A., Tang, H., Bansal, K., Zou, X., Klasky, S., and Samatova, N. F. The hyperdyadic index and generalized indexing and query with pique. In Proceedings of the 27th International Conference on Scientific and Statistical Database Management (2015). Google ScholarDigital Library
- Byna, S., Wehner, M. F., and Wu, K. J. Detecting atmospheric rivers in large climate datasets. In Proceedings of the 2nd international workshop on Petascal data analytics: challenges and opportunities, ACM (2011). Google ScholarDigital Library
- Carns, P., Latham, R., Ross, R., Iskra, K., Lang, S., and Riley, K. 24/7 characterization of petascale I/O workloads. In Cluster Computing and Workshops (2009).Google ScholarCross Ref
- Chou, J., Wu, K., and Prabhat. FastQuery: A parallel indexing system for scientific data.Google Scholar
- Cornford, S. L., Martin, D. F., Graves, D. T., Ranken, D. F., Le Brocq, A. M., Gladstone, R. M., Payne, A. J., Ng, E. G., and Lipscomb, W. H. Adaptive mesh, finite volume modeling of marine ice sheets. Journal of Computational Physics (2013). Google ScholarDigital Library
- Dong, B., Byna, S., and Wu, K. Parallel query evaluation as a scientific data service. In Cluster Computing (CLUSTER), 2014 IEEE International Conference on, IEEE (2014).Google ScholarCross Ref
- Guttman, A. R-trees: A dynamic index structure for spatial searching. ACM, 1984.Google ScholarDigital Library
- Jenkins, J., Arkatkar, I., Lakshminarasimhan, S., Shah, N., Schendel, E. R., Ethier, S., Chang, C. S., Chen, J. H., Kolla, H., Klasky, S., and Samatova, N. F. Analytics-driven lossless data compression for rapid in-situ indexing, storing, and querying. In Proc. Database and Expert Systems Applications (DEXA) (2012).Google ScholarCross Ref
- Kim, J., Abbasi, H., Chacon, L., Docan, C., Klasky, S., Liu, Q., Podhorszki, N., Shoshani, A., and Wu, K. Parallel in situ indexing for data-intensive computing. In Large Data Analysis and Visualization (LDAV), 2011 IEEE Symposium on (2011).Google ScholarCross Ref
- Kreylos, O., Weber, G. H., Bethel, E., Shalf, J. M., Hamann, B., and Joy, K. I. Remote interactive direct volume rendering of AMR data. Lawrence Berkeley National Laboratory (2002).Google Scholar
- Lakshminarasimhan, S., Boyuka II, D. A., Pendse, S. V., Zou, X., Jenkins, J., Vishwanath, V., Papka, M. E., and Samatova, N. F. Scalable in situ scientific data encoding for analytical query processing. In Proceedings of the HPDC 2013 (2013). Google ScholarDigital Library
- Lakshminarasimhan, S., Zou, X., Boyuka Ii, D. A., Pendse, S. V., Jenkins, J., Vishwanath, V., Papka, M. E., Klasky, S., and Samatova, N. F. DIRAQ: Scalable in situ data-and resource-aware indexing for optimized query performance. Cluster Computing (2014). Google ScholarDigital Library
- Wu, K. FastBit: An efficient indexing technology for accelerating data-intensive science. Journal of Physics: Conference Series 16 (2005).Google Scholar
- Zou, X., Wu, K., Boyuka, D., Martin, D. F., Byna, S., Tang, H., Bansal, K., Ligocki, T. J., Johansen, H., and Samatova, N. F. Parallel in situ detection of connected components in adaptive mesh refinement data. In Cluster, Cloud and Grid Computing (CCGrid) (2015).Google Scholar
Index Terms
- AMR-aware in situ indexing and scalable querying
Recommendations
Temporal join processing with hilbert curve space mapping
SAC '14: Proceedings of the 29th Annual ACM Symposium on Applied ComputingManagement of data with a time dimension increases the overhead of storage and query processing in large database applications especially with the join operation, which is a commonly used and expensive relational operator. The join evaluation is ...
A Blockchain Query Optimization Method Based on Hybrid Indexes
Web Information Systems and ApplicationsAbstractBlockchain technology possesses the characteristics of decentralization and immutability, making it widely applicable in various fields. However, existing blockchain systems display weak performance in terms of data management, typically only ...
Scalable Query Profiling Employing Purging and Elimination Technique
ICIBE '18: Proceedings of the 4th International Conference on Industrial and Business EngineeringReusing Queries contributes in speeding up the performance of database in responding to future queries as it can reduce the number of database queries to be processed and sent back to the user. Profiling a query in a machine who requested a query in ...
Comments