Abstract
The explosive growth of modern scientific data opens new challenges for storing and accessing very large (petabytes) scale data. Traditional file systems and databases cannot meet the requirements of managing scientific data. Arrays are considered as a natural data model for scientific data. Some science-oriented systems have been developed for array data model handling. However, a shortcoming of those systems is that most of them use a “no overwrite” storage strategy, which destabilizes the performance of supporting different applications. In this paper, we proposed an application-aware storage strategy which can optimize data layout gradually according to different access patterns. We implemented the strategy based off of SciDB by creating arrays with different indices for specific parts of the dataset. Experiment testing has been conducted to verify the proposed strategy, and the experimental results show that our strategy improves the performance of science-oriented database on supporting various kinds of applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Paul, G.B.: Overview of SciDB: large scale array storage, processing and analysis. In: Proceedings of the 2010 International Conference on Management of Data, SIGMOD 2010, pp. 963–968. ACM, New York (2010)
Cudre-Mauroux, P., Lim, H., Simakov, J.: A Demonstration of SciDB: A Science-Oriented DBMS. In: 35th International Conference on Very Large Data Bases (VLDB 2009), Lyon, pp. 1534–1537 (2009)
Stonebraker, M., Becla, J., DeWitt, D.J., Lim, K.T., Maier, D., Ratzesberger, O., Zdonik, S.: Requirements for science data bases and scidb. In: Conference on Innovative Data Systems Research (CIDR), Monterey (2009)
Kesheng, W., Surendra, B., Doron, R., Arie, S.: Scientific data services: a high-performance I/O system with array semantics. In: Proceedings of the First Annual Workshop on High Performance Computing Meets Databases, pp. 9–12. ACM, Washington (2011)
Huaiming, S., Yanlong, Y., Yong, C., Xian-He, S.: A cost-intelligent application-specific data layout scheme for parallel file systems. In: Proceedings of the 20th International Symposium on High Performance Distributed Computing, pp. 37–48. ACM, California (2011)
European Organization for Nuclear Research, http://public.web.cern.ch/public/
Stonebraker, M.: A next-generation information system for the study of global change. In: Proc. 13th IEEE Symp. on Mass Storage Systems, Sequoia, pp. 47–53 (1994)
Dewitt, P.J., Kabra, N., Luo, P., Patel, M.: Client-Server Paradise. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 558–569. Morgan Kaufmann (1994)
Ivanova, P., Nes, N., Goncalves, R., Kersten, M.: MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database. In: Proceedings of the 19th International Conference on Scientific and Statistical Database Management, p. 13. IEEE, Washington (2007)
Cudre-Maroux, P.: SS-DB: A Standard Science DBMS Benchmark (submitted for publication)
Baumann, P., Dehmel, A., Furtado, P., Ritsch, R., Widmann, N.: The multidimensional database system RasDaMan. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp. 575–577. ACM Press, Washington (1998)
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation, pp. 137–150. USENIX Association, San Francisco (2004)
National Astronomical Observatories, Chinese Academy of Sciences, http://www.bao.ac.cn/
Sloan Digital Sky Survey, http://www.sdss.org/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, R. et al. (2012). Application-Aware Storage Strategy for Scientific Data. In: Khachidze, V., Wang, T., Siddiqui, S., Liu, V., Cappuccio, S., Lim, A. (eds) Contemporary Research on E-business Technology and Strategy. iCETS 2012. Communications in Computer and Information Science, vol 332. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34447-3_61
Download citation
DOI: https://doi.org/10.1007/978-3-642-34447-3_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34446-6
Online ISBN: 978-3-642-34447-3
eBook Packages: Computer ScienceComputer Science (R0)